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Preface 


We are pleased to present the proceedings of the 20th International Conference on 
Applied Cryptography and Network Security (ACNS 2022). ACNS 2022 was held 
in Rome, Italy. Due to the ongoing COVID-19 crisis, we decided to have a hybrid 
conference to face any health risks or travel restrictions for attending the conference. 
The organization was in the capable hands of Mauro Conti (University of Padua, Italy) 
and Angelo Spognardi (Sapienza University of Rome, Italy) as general co-chairs, and 
Massimo Bernaschi (National Research Council, [AC-CNR, Italy) and Fabio De Gaspari 
(Sapienza University of Rome, Italy) as local organizing chairs. We are deeply indebted 
to them for their tireless work to ensure the success of the conference even in such 
complex conditions. 

For the third time, ACNS had two rounds of submission cycles, with deadlines in 
September 2021 and January 2022, respectively. We received a total of 185 submissions 
from authors in 37 countries. This year’s Program Committee (PC) consisted of around 
150 members with diverse backgrounds and broad research interests. The review process 
was double-blind and rigorous, and papers were evaluated on the basis of research 
significance, novelty, and technical quality. In total, 691 reviews were submitted, with 
four reviews for most papers. Some papers submitted in the first round received a decision 
of major revision. The revised versions of those papers were further evaluated in the 
second round and some of them were accepted. After the review process concluded, a 
total of 44 papers were accepted to be presented at the conference and included in the 
proceedings, representing an acceptance rate of around 24%. 

Among those papers, we awarded the Best Student Paper Award to Narmeen Shafqat 
(Northeastern University, Boston, MA, USA) for the paper “ZLeaks: Passive Inference 
Attacks on Zigbee based Smart Homes” (co-authored with Daniel J. Dubois, David 
Choffnes, Aaron Schulman, Dinesh Bharadia, and Aanjhan Ranganathan). The monetary 
prize of 1,000 euro was generously sponsored by Springer. 

We had a rich program including eight satellite workshops in parallel with 
the main event, providing a forum to address specific topics at the forefront of 
cybersecurity research. The papers presented at those workshops were published in 
separate proceedings. 

This year we had two outstanding keynote talks: “Chosen Ciphertext Security from 
Injective Trapdoor Functions” by Prof. Susan Hohenberger Waters (Johns Hopkins Uni- 
versity, USA), and “Secure Computation in Practice” by Prof. Raluca Ada Popa (Univer- 
sity of California, Berkeley, USA). To them, our heartfelt gratitude for their outstanding 
presentations. 

The conference was made possible by the untiring efforts of many individuals and 
organizations. We are grateful to all the authors for their submissions. We sincerely 
appreciate the outstanding work of all the PC members and the external reviewers, who 
selected the papers after reading, commenting, and debating them. Finally, we thank 
all the people who volunteered their time and energy to put together the conference, 
the speakers and session chairs, and everyone who contributed to the success of the 
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conference. We are also grateful to Riccardo Lazzeretti (Sapienza University of Rome, 
Italy) for taking care of these proceedings. Last, but certainly not least, we are very 
grateful to Frontiers for sponsoring the conference, Easychair for the management of 
the submissions, and Springer for their help in assembling these proceedings. 
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Encryption 


Keyed-Fully Homomorphic Encryption 
Without Indistinguishability Obfuscation 


Shingo Sato!) , Keita Emura?, and Atsushi Takayasu? 


1 Yokohama National University, Yokohama, Japan 
sato-shingo-zk@ynu.ac. jp 
? National Institute of Information and Communications 
Technology (NICT), Koganei, Japan 
k-emura@nict.go.jp 
3 The University of Tokyo, Bunkyo-ku, Japan 
takayasu-a@g.ecc.u-tokyo.ac.jp 


Abstract. (Fully) homomorphic encryption ((F)HE) allows users to 
publicly evaluate circuits on encrypted data. Although public homo- 
morphic evaluation property has various applications, (F)HE cannot 
achieve security against chosen ciphertext attacks (CCA2) due to its 
nature. To achieve both the CCA2 security and homomorphic evalua- 
tion property, Emura et al. (PKC 2013) introduced keyed-homomorphic 
public key encryption (KH-PKE) and formalized its security denoted 
by KH-CCA security. KH-PKE has a homomorphic evaluation key that 
enables users to perform homomorphic operations. Intuitively, KH-PKE 
achieves the CCA2 security unless adversaries have a homomorphic eval- 
uation key. Although Lai et al. (PKC 2016) proposed the first keyed- 
fully homomorphic encryption (keyed-FHE) scheme, its security relies 
on the indistinguishability obfuscation (iO), and this scheme satisfies a 
weak variant of KH-CCA security. Here, we propose a generic construc- 
tion of a KH-CCA secure keyed-FHE scheme from an FHE scheme secure 
against non-adaptive chosen ciphertext attack (CCA1) and a strong dual- 
system simulation-sound non-interactive zero-knowledge (strong DSS- 
NIZK) argument system by using the Naor-Yung paradigm. We show 
that there are a strong DSS-NIZK and an IND-CCA1 secure FHE scheme 
that are suitable for our generic construction. This shows that there exists 
a keyed-FHE scheme from simpler primitives than iO. 


Keywords: Keyed-homomorphic public key encryption - Keyed-fully 
homomorphic encryption - Strong DSS-NIZK 


1 Introduction 


1.1 Background 


Check for 
updates 


Homomorphic encryption (HE) allows users to convert encryptions of messages 


mi,. 


..,mg into an encryption of C(m,...,m¢) publicly for some circuit C. In 


particular, fully homomorphic encryption (FHE) can be used to handle arbitrary 
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circuits. The public homomorphic evaluation property is applied to various appli- 
cations. For example, suppose encryptions of private data are stored in a remote 
server, delegating computations on the encrypted data to the server without 
revealing the private data is possible. Thus, users leverage the results of com- 
putations on other devices without compromising data privacy. Since Gentry 
proposed the first FHE scheme [24], the research area has gained widespread 
attention and many schemes have been proposed (e.g., FHE schemes [5,7— 
11,19, 24,25], identity-based FHE (IBFHE) schemes [15,25], and attribute-based 
FHE schemes [6,25]), where most schemes are secure under the learning with 
errors (LWE) assumption. Although the public evaluation property is useful, one 
downside is that (F)HE schemes are vulnerable against adaptive chosen cipher- 
text attacks (CCA). (In this paper, we use IND-CCA2 or IND-CCA, IND-CCA1, 
and IND-CPA as indistinguishability against adaptive chosen ciphertext attacks, 
non-adaptive chosen ciphertext (i.e., lunchtime) attacks, and chosen-plaintext 
attacks, respectively). Therefore, several IND-CCA1 secure (F)HE schemes have 
been proposed. For example, Canetti et al. [11] proposed a generic construction 
of IND-CCA1 secure FHE from the LWE assumption or a zero-knowledge suc- 
cinct non-interactive argument of knowledge (zk-SNARK) [3,4] and IND-CPA 
secure FHE. However, IND-CCA1 security can be inadequate for FHE since Lof- 
tus et al. [33] showed that an IND-CCA1 secure FHE scheme is vulnerable against 
ciphertext validity attacks. 

To achieve both CCA2-like security and homomorphic evaluation property, 
Emura et al. [21,22] introduced keyed-homomorphic public-key encryption (KH- 
PKE). Contrary to traditional HE, the homomorphic evaluation property of 
KH-PKE is not public. Specifically, KH-PKE has a homomorphic evaluation 
key. Thus, only users with the homomorphic evaluation key can perform homo- 
morphic operations. Due to its nature, KH-PKE can achieve CCA2-like secu- 
rity.! Suppose adversaries do not have the homomorphic evaluation key, then, 
KH-PKE satisfies the IND-CCA2 security. Moreover, KH-PKE satisfies stronger 
security than HE even if adversaries receive a homomorphic evaluation key. Sup- 
pose adversaries receive the homomorphic evaluation key before the challenge 
query, then the strongest security that KH-PKE can satisfy is the IND-CCA1 
security as the case of HE. In contrast, KH-PKE can satisfy stronger securi- 
ties than the IND-CCA1 security if adversaries receive the homomorphic evalua- 
tion key after the challenge query since they continue making decryption queries 
until they receive the homomorphic evaluation key. Moreover, KH-PKE is secure 
against ciphertext validity attacks [20]. 

Emura et al. [22] proposed the notion of KH-PKE but their security proofs con- 
tain bugs (which have been corrected in [21] and they gave the KH-PKE schemes 
under the decisional Diffie-Hellman (DDH) assumption or the decisional com- 
posite residuosity (DCR) assumption). Libert et al. [32] proposed the first KH- 
PKE schemes secure in the model given in [22] using the Decision Linear (DLIN) 
assumption or the symmetric external Diffie-Hellman (SXDH) assumption. 


1 Although Desmedt et al. [18] proposed a HE scheme with a designated evaluation 
called controlled HE, no CCA security was considered unlike the KH-PKE. 
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Jutla and Roy [29] proposed a KH-PKE scheme based on SXDH assumption. 
All KH-PKE schemes support either multiplicative or additive homomorphisms. 
Maeda and Nuida [35] proposed a two-level KH-PKE scheme that supports one 
multiplication and any number of additions. Lai et al. [30] proposed the first keyed- 
fully homomorphic encryption (keyed-FHE)? scheme, which is secure under lat- 
tice assumptions and the indistinguishability obfuscation (iO) [1]. However, known 
candidates of iO [1] remain arguable. Therefore, constructing keyed-FHE schemes 
without iO has to be an interesting open problem. We remark that the keyed-FHE 
scheme of [30] satisfies only weaker security than the KH-PKE’s security (called 
KH-CCA security) formalized in [21]. In the case where an adversary receives a 
homomorphic evaluation key before the challenge query, the security considered 
in [30] corresponds to the IND-CPA security of (F) HE, while in that case, KH-CCA 
security corresponds to the IND-CCAI security of (F)HE. 


1.2 Contribution 


In this work, we propose a generic construction of the keyed-FHE without iO. This 
construction uses IND-CCA1 secure FHE and a strong dual-system unbounded 
simulation-sound NIZK (strong DSS-NIZK) introduced by Jutla and Roy [29] as 
building blocks, where the strong DSS-NIZK is used for FHE ciphertext. In our 
security proof, we employ the Naor-Yung paradigm [36,37] to achieve IND-CCA2- 
like security. Since no strong DSS-NIZK scheme exists for NP, we have to construct 
the desired scheme. For this purpose, we show that a modification of Jutla and 
Roy’s strong DSS-NIZK scheme [29] satisfies the requirement of our generic con- 
struction of keyed-FHE, where the construction of the strong DSS-NIZK scheme 
uses a smooth projective hash proof system (PHPS) and an unbounded simulation- 
sound NIZK scheme. We note that there are smooth PHPS [2] secure statisti- 
cally and unbounded simulation-sound NIZK schemes [12,26,31] whose security 
depends on lattice assumptions or the security of the commitment schemes used in 
[12,26]. We remark that for adopting the strong DSS-NIZK scheme above we need 
to assume that the underlying IND-CCA1 secure FHE schemes are publicly verifi- 
able (but these exists such a scheme [11]). To sum up, we obtain the first keyed- 
FHE scheme without iO. Note that even if an IND-CPA secure FHE scheme under (a 
variant of) the approximate GCD assumption (e.g., [13, 16, 19]) is employed to con- 
struct an IND-CCAI1 secure FHE scheme, our generic construction gives no keyed- 
FHE scheme based solely on that assumption because there is no existing HPS 
for approximate GCD-based ciphertexts. Furthermore, another advantage of our 
result is that our keyed-FHE scheme satisfies stronger security (i.e., KH-CCA secu- 
rity) than the existing keyed-FHE scheme [30]. 


1.3 Technical Overview 


We give a brief overview of our results. Since Lai et al. [30] constructed the 
keyed-FHE scheme using iO, the most convincing way to achieve the goal is 


2 In this paper, keyed-FHE is a public key setting. 
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to remove the iO from the construction. However, completing the task seems 
technically difficult. Thus, we focus on Jutla and Roy’s KH-PKE scheme [29] 
under the SXDH assumption. Their construction used an ElGamal encryption 
scheme and a stronger version of the dual-system unbounded simulation-sound 
NIZK (DSS-NIZK) for the Diffie-Hellman language. Due to the nature of one- 
time simulation-sound NIZK for the Diffie-Hellman language, their construction 
satisfies IND-CCA2-like security as noted in [27]. Therefore, the remaining task to 
prove the security is how to simulate the homomorphic key reveal oracle (RevHK) 
and how to prove the IND-CCA1 security even after the RevHK query. Here, 
the properties of strong DSS-NIZK resolve the problems. The homomorphic 
evaluation key of the KH-PKE scheme is a trapdoor of the strong DSS-NIZK. 
In particular, one-time full zero-knowledge ensures that the strong DSS-NIZK 
is trapdoor leakage resilient. Moreover, unbounded partial simulation-soundness 
ensures that their KH-PKE scheme satisfies the IND-CCA1 security even after the 
RevHK query. To satisfy the required properties, Jutla and Roy constructed the 
strong DSS-NIZK scheme for the Diffie-Hellman language using quasi-adaptive 
NIZK for the same language [28] and a hash proof system (HPS) [17] that is 
smooth projective and universal. 

Using a similar approach, we construct the keyed-FHE without iO by replac- 
ing (a variant of) the ElGamal encryption scheme with FHE schemes. For this 
purpose, we have to overcome three issues. First, Jutla and Roy’s KH-PKE 
scheme used strong DSS-NIZK for the Diffie-Hellman language that is not suit- 
able for FHE. Therefore, we construct strong DSS-NIZK for another language 
that handles FHE ciphertexts. Thus, we construct the strong DSS-NIZK for 
NP. Second, Jutla and Roy’s KH-PKE scheme satisfies IND-CCA2-like security 
based on simulation-sound NIZK for the Diffie-Hellman language. That is, just 
replacing the ElGamal encryption scheme with FHE schemes does not satisfy 
IND-CCA2-like security. Here, we resolve the issue by employing the Naor-Yung 
paradigm [36,37]. For simplicity, these modifications enable us to construct a 
keyed-FHE scheme without iO. We observe whether we can construct strong 
DSS-NIZK for NP following a similar approach as Jutla and Roy. Jutla and 
Roy used quasi-adaptive NIZK for the Diffie-Hellman language and an HPS [17] 
that is smooth projective and universalg. In this step, the last issue occurs since 
there is no known lattice-based universalo HPS. We construct the desired strong 
DSS-NIZK for NP by replacing the universal HPS of Jutla-Roy’s construction 
with unbounded simulation-sound NIZK and modifying slightly the construction. 
Therefore, this completes a brief overview of our generic keyed-FHE scheme. 

All building blocks of our generic construction of keyed-FHE do not require 
iO. We remark that our generic construction of keyed-FHE requires only the 
IND-CCA1 security for the underlying FHE scheme, but our strong DSS-NIZK 
system requires public verifiability for the IND-CCA1 secure FHE scheme. There 
is an IND-CCAI secure publicly verifiable FHE scheme [11] under zk-SNARK [3, 
4]. It is known that there exist zk-SNARK systems in the quantum random 
oracle model [14]. Hence, there exists an IND-CCA1 secure FHE scheme in the 
quantum random oracle model. In addition, we can also obtain an IND-CCA1 
secure FHE scheme without random oracles if the underlying zk-SNARK is based 
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on a strong assumption such as knowledge assumptions. We can construct strong 
DSS-NIZK using the following building blocks: (1) the NIZK system for NP in 
the random oracle model from -protocols (ZKBoo) [26] using the Fiat-Shamir 
transformation [23], the NIZK system secure in the quantum random oracle 
model [12], or the NIZK system secure in the standard model [31], and (2) the 
smooth projective HPS [2] for lattice-based ciphertexts. Therefore, we can obtain 
a keyed-FHE scheme secure in the standard model or the quantum random 
oracle model. Notice that Libert et al. proposed a simulation-sound NIZK system 
for LWE-like relations in the standard model [31], they do not give a security 
proof that it satisfies the zero-knowledge property after the trapdoor is revealed. 
Nevertheless, since their zero-knowledge property is statistical, it can be applied 
to our construction. However, their scheme is not very efficient, and thus it 
would be interesting to see that the efficiency of their NIZKs could be improved 
in future work. 


2 Preliminaries 


We use the following notation: For a positive integer n, let [n] := {1,2,...,n}. 
For n values £1, £2,...,%n and a subset I C [n] of indexes, let {x;};e7 be a set 
of values whose indexes are included in J, and let (x;);e7 be a sequence of values 
whose indexes are included in J. Probabilistic polynomial-time is abbreviated 
as PPT. If a function f : N — R fulfills f(A) = o(A~°) for every constant 
c > 0 and sufficiently large A € N, then we say that f is negligible in A and 
write f(A) = negl(A). A probability is overwhelming if it is 1 — negl(A). For a 
probabilistic algorithm A, y — A(x;r) means that A takes as input 2 and a 
picked randomness r, and it outputs y. 
In addition, we describe the definitions of several cryptographic primitives. 


2.1 Non-Interactive Zero-Knowledge Argument 


Definition 1. A non-interactive zero-knowledge argument (NIZK) system for 
a relation R C {0,1}* x {0,1}* consists of three polynomial-time algorithms 
(Gen, P, V): Let £(R) = {x | dw s.t. (x, w) E€ R} be the language defined by R. 


- crs — Gen(1*): The randomized algorithm Gen takes as input a security 
parameter 1*, and it outputs a common reference string (CRS) crs. 

- m + P(crs,2,w): The randomized algorithm P takes as input a CRS crs, a 
statement x, and a witness w, and it outputs a proof t. 

- 1/0 — V(crs, x, m): The deterministic algorithm V takes as input a CRS crs, 
a statement x, and a proof n, and it outputs 1 or 0. 


We define several properties of NIZKs which are required for constructing 
strong DSS-NIZK. For removing universalą property of PHPS, the adversary is 
allowed to query x such that x ¢ £(R) in the definition of unbounded simulation- 
soundness. For considering trapdoor leakage in strong DSS-NIZK, the adversary 
is allowed to obtain a trapdoor td in the definition of composable zero-knowledge. 
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Definition 2. In this paper, it is required that a NIZK system (Gen,P,V) 
with a PPT simulator Sim = (Simo,Sim;) satisfies completeness, unbounded 
simulation-soundness, and (composable) zero-knowledge: Let Simg be a PPT algo- 
rithm which, given 1», outputs a CRS crs and a trapdoor td, and Sim, be a PPT 
algorithm which, given crs, td, and a statement x, outputs a simulated proof n. 


Completeness. For every (x,w) € R, it holds that Pr[crs — Gen(1*);7 — 
P(crs, x, w) : V(crs, x, T) = 1] > 1 — negl(A). 
Unbounded Simulation-Soundness. For any PPT adversary A, it holds that 


l | (a*, 0") € Q^ 
Pr (crs, td) — Simo(1*); Q — 0; x ( 


i < 
(x*,T*) p ASim1 (ers,td;:) (crs) R)A a a negl(A), 


where the Sim, oracle on input x returns n — Sim: (crs, td, x) and sets Q — 
OU {(a,7)}. Notice that A is allowed to query x such that x ¢ L(R). 

Composable Zero-Knowledge. For any PPT adversaries A, and Ag, it holds 
that 


[Pr [ers — Gen(1*) : 1 — Ai(crs)] — Pr [(crs, td) — Simo(1*) : 1 — Aı(crs)]| | 
< negl(A), and 


Pr[(crs, td) — Simo(1>) : 1 — AS“? (crs, td)] 


— Pr[(crs,td) — Simo (1) : 1 — ASIM” (crstdi) (ers, td)] < negl(A), 


where the Sim* oracle on input (x,w) ¢ R returns L if (x,w) ¢ R, and 
returns m — Sim: (crs, td, x) otherwise. 


2.2 Dual-System Simulation-Sound NIZK 


Following [29], we describe the definition of dual-system (unbounded) simulation- 
sound NIZK (DSS-NIZK). 


Definition 3. A DSS-NIZK system for a relation R C {0,1}* x {0,1}* con- 
sists of polynomial-time algorithms in three worlds, as follows: Let L(R) = {x | 
dw s.t. (x,w) E€ R} be the language defined by R. We remark that the witness 
relation parameter p is introduced in [29] because it considers quasi-adaptive 
NIZK. We omit the parameter in this paper. 

Real World. A DSS-NIZK in real world consists of three polynomial-time 
algorithms (Gen, P, V): 


— ers — Gen(1*): The randomized algorithm Gen, called a generator, takes as 
input a security parameter 1, and it outputs a common reference string 
(CRS) crs. 

- m — P(crs, x, w,lbl): The randomized algorithm P, called a prover, takes as 
input a CRS crs, a statement x, a witness w, and a label Ibl € {0,1}*, and it 
outputs a proof T. 
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- 1/0 — V(crs,x,7,lbl): The deterministic algorithm V, called a verifier, takes 
as input a CRS crs, a statement x, a proof n, and a label Ibl € {0,1}*, and it 
outputs 1 or 0. 


Partial-Simulation World. A DSS-NIZK in partial-simulation world con- 
sists of three polynomial-time algorithms (sfGen, sfSim, pV): 


- (ers, tds, td,,) — sfGen(1*): The randomized algorithm sfGen, called a semi- 
functional generator, takes as input a security parameter 1>, and it outputs 
a semi-functional CRS crs, and two trapdoors td, and tdy. 

- m — sfSim(crs, tds, x, 3, Ibl): The randomized algorithm sfSim, called a semi- 
functional simulator, takes as input a CRS crs, a trapdoor td,, a statement x, 
a membership-bit 3 € {0,1}, and a label Ibl € {0,1}*, and it outputs a proof 
T: 

- 1/0 — pV(crs, td,, 2,7, lbl): The deterministic algorithm pV, called a private 
verifier, takes as input a CRS crs, a trapdoor td,, a statement x, a proof 7, 
and a label Ibl € {0,1}*, and it outputs 1 or 0. 


One-time Full Simulation World. A DSS-NIZK in one-time full simu- 
lation world consists of three polynomial-time algorithms (otfGen, otfSim, sfV): 


- (crs, tds, td, 1,td,,) — otfGen(1*): The randomized algorithm otfGen, called a 
one-time full generator, takes as input a security parameter 1>, and it outputs 
a CRS crs and three trapdoors tds, tds 1, and tdy. 

- m — otfSim(crs, tds,1, x, Ibl): The randomized algorithm otfSim, called a one- 
time full simulator, takes as input a CRS crs, a trapdoor tds, a statement 
x, and a label Ibl € {0,1}*, and it outputs a proof r. 

- 1/0 — sfV(ers, td,, 2,7, Ibl): The deterministic algorithm sfV, called a semi- 
functional verifier, takes as input a CRS crs, a trapdoor tdy, a statement x, 
a proof 7, and a label Ibl € {0,1}*, and it outputs 1 or 0. 


Definition 4. It is required that a DSS-NIZK system for a relation R satisfies 
completeness, partial zero-knowledge, unbounded partial simulation-soundness, and 
one-time full zero-knowledge: 


Completeness. For every (x,w) € R and every Ibl € {0,1}*, it holds that 
Pr[crs — Gen(1*); m — P(crs, x, w, Ibl) : V(ers, x, m, Ibl) = 1] > 1 — negl(A). 
(Composable) Partial Zero-Knowledge. For any PPT algorithms Ag and 

Ay, it holds that 
|Pr[crs < Gen(1*) : 1 — Ao(crs)] 
— Pr[(crs, tds, td,,) — sfGen(1) : 1 — Ao(crs)]| < negl(A), and 
|Pr[(crs, tds, td.) <— sfGen(1*) : 1 — AP Crn )hsfSim" (ersitda 33), VCers1) (ergy) 
— Pr[(crs, td,,td,,) — sfGen(1?) : 


fies ASi Seieaer Pie (erst) pN (ES des) (crs)] < negl(A), 
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where sfSim* (crs, td;,2,w,lbl) oracle returns sfSim(crs, tds, x, 8 = 1,|bl), the 
challenger aborts if either (x, w, Ibl) such that (2,w) € R is queried to the first 
oracle (sfSim* or P), or the second oracle sfSim* receives a query (x, 3, Ibl) 
such that B =0 or xz ¢ L(R). 

Unbounded Partial Simulation-Soundness. For any PPT algorithm A, it 
holds that 


Pr[(crs, tds, td,) — sfGen(1*); (a, m, Ibl) — ASfSim(ers:tds »,:),PV(crsstdu 55°) (crs) ; 
((a ¢ L(R) V Vicrs, x, 7, Ibl) = 0) A pV (crs, td, x, m, Ibl) = 1] < negl(A). 


One-time Full Zero-Knowledge. For any PPT algorithm A = (Ao,A1), tt 
holds that 


|Pr[(crs, tds, td,,) — sfGen(A); 
(2*, B*, IbI*, st) — Astsion® (erste oBV(erstly sy) (cps), 
a — sfSim(crs, tds, £*, 8" II) : 1 = A ie emer ie st)] 
— Pr[(crs, tds, tds,1, td.) — otfGen(A); 
(£*, B*, IbI", st) — ASit (eratda rys) 8V Cerstdv ra) (cpg) 


T“ +— otfSim(crs, tds,1, £*,lbl*) : 1 — 


fSim* (crs,tde,-,-,-) sN lers tdas 
em (crs,tds, ),sfV(crs,tdy (n*, st)]| 


< negl(A), 


where st is state-information, and the challenger aborts if one of the following 
conditions holds: 
— The generated (a*, 3*) is not correct for the language L(R).° 
- (x, 3, |bl) such that the membership-bit B is not correct for L(R) is queried 
to the first oracle sfSim*. 
— The generated (x*,n*,lbl*) is queried to sfV/pV. 


Propositions 1 and 2 were proven in [29]. Here, for a DSS-NIZK system Jpn, 
let Adv?“ (A) be the maximum probability that any PPT adversary breaks the 
partial zero-knowledge of Jon, let Advi, (À) be the maximum probability that 
any PPT adversary breaks the unbounded partial simulation-soundness of py, 
and let Adv?“ (A) be the maximum probability that any PPT adversary breaks 
the one-time full zero-knowledge of [Tpn. 


Proposition 1 ((29], Lemma 4 (true simulation-soundness)). If a DSS- 
NIZK IIpn fulfills both of properties partial zero-knowledge and unbounded partial 
simulation-soundness, then for any PPT adversary A, it holds that 


(crs, tds, tdp) — sfGen(1*); 
(x, 7, Ibl) — ASfSim* (ers;tds +3) (crs) 


< Adv% (A) + Advis (A), 


Pr : V(crs, x, T, lbl) = 1 Ax ¢ L(R) 


3 (a, B) is correct for a language L(R) (or 8 is correct for x) if x € L(R) A 8 = 1, or 
x ¢ L(R)A GB = 0. (x, B) is not correct for L(R) (or 8 is not correct for x) otherwise. 
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where the challenger aborts if A issues a query (y, B, Ibl) such that y ¢ L(R) or 
B = 0, to the sfSim* oracle. 


Proposition 2 ((29], Lemma 12 (simulation-soundness of semi- 
functional verifier)). If a DSS-NIZK IIpn fulfills both of properties one-time 
full zero-knowledge and unbounded partial simulation-soundness, then, for any 
PPT algorithm A = (Ao,A1), it holds that 


(crs, tds, tds,1, tdu) — otfGen(1>); 
Pr (x*,Ibl*, 8*, st) — Asien” (crstda rys) SPV (erstdu ess) Cors); : sfV (crs, tdv, x, m, Ibl) = i 
a* — otfSim(crs, tds,1, 2*, Ibl*); Ax ¢ L(R) 
m Ibl, T) E Aim” (crs,tdass:3)is Vers, tt!) (r*, st) | 


< Advis (A) + AdviP (A), 
where the challenger aborts if at least one of the following conditions hold: 


- For (x, B,Ibl) queried to the sfSim* oracle, (x, B) is not correct for L(R). 
— B* is not the correct membership-bit of L(R). 

— (x*, |bl*,2*) is queried to sfV. 

— The output of A is the same as (a*, |bl*, 2”). 


Furthermore, a stronger notion of DSS-NIZK is defined as follows. We call 
reveal event when td, is revealed to adversaries where (crs, tds, td,,) — sfGen(1*) 
or (crs, tds, tds,1,tdy) — otfGen(1). 


Definition 5 (Strong DSS-NIZK [29]). A DSS-NIZK system with partial 
simulation trapdoor reveal oracle is a strong DSS-NIZK system with the following 
changes to the DSS-NIZK definition: 


— The first part of the composable partial zero-knowledge continues to hold. 

— The second part of the composable partial zero-knowledge holds under the addi- 

tional restriction that the adversary cannot invoke the third oracle (i.e., V or 

pV oracle) after the reveal event. 

The unbounded partial simulation-soundness continues to hold. 

— The trapdoors td, and tds, generated by otfGen are same and statistically 
indistinguishable from td, generated by sfGen. 

- The one-time full zero-knowledge holds under the additional restriction that 
(x*, B*, |bI*) is such that x* € L(R) and B* = 1 and the second oracle (i.e., 
pV or sfV oracle) is not invoked after the reveal event. 

- The simulation-soundness of sfV (Proposition 2) holds under the additional 
restriction that sfV oracle is not invoked after the reveal event. Notice that 
there is no restriction that (x*, 3*,|bl") is such that x* € L(R) and B* = 1. 


2.3 (Keyed-)Fully Homomorphic Encryption 


Definition 6. A fully homomorphic encryption (FHE) scheme consists of four 
polynomial-time algorithms (KGen, Enc, Dec, Eval): For a security parameter A, 
let M = M(A) be a message space. 
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— (pk, sk) — KGen(1*): The randomized algorithm KGen takes as input a secu- 

rity parameter 1>, and it outputs a public key pk and a secret key sk. 

ct — Enc(pk,m): The randomized algorithm Enc takes as input a public key 

pk and a message m E M, and it outputs a ciphertext ct. 

- m/L <— Dec(sk, ct): The deterministic algorithm Dec takes as input a secret 
key sk and a ciphertext ct, and it outputs a message m E€ M or a rejection 
symbol L. 

- ct — Eval(C, (ct, ct@),...,ct)): The deterministic or randomized algo- 
rithm Eval takes as input a circuit C: M —> M and a tuple of ciphertexts 
(ct ct®,... ct), and it outputs a new ciphertect ct. 


We require that an FHE scheme meet both correctness and compactness. 


Definition 7 (Correctness). An FHE scheme (KGen, Enc, Dec, Eval) satisfies 
correctness if the following conditions hold: 


- For every (pk, sk) — KGen(1*) and every m € M, it holds that Dec(sk, ct) = m 
with overwhelming probability, where ct — Enc(pk, m). 

- For every (pk,sk) — KGen(1*), every circuit C, and every (m™,...,m ©) € 
M‘, it holds that Dec(sk,ct) = C(m™,...,m©) with overwhelming prob- 
ability, where ct — Eval(C, (ct™,...,ct)) and for every i € |f, t@ — 
Enc(pk, m®). 

Definition 8 (Compactness). An FHE scheme satisfies compactness if there 


exists a polynomial poly such that the output-size of Eval(-,-) is at most poly(A) 
for every security parameter À. 


Definition 9 (IND-CCA1 security). An FHE scheme [fue = (KGen, Enc, Dec, 
Eval) is IND-CCA1 secure if for any PPT adversary A = (Ao, A1) against True, 
the advantage 


(pk, sk) — KGen(14); 
(mg, mj, st) — A (pk); on — 1 


bÈ {0, 1}; ct* — Enc(pk, mọ); 2 
b — Ax(ct*, st) 


Adve ca (A) := [Pr 


is negligible in A, where st is state information. 


Following the definition of KH-PKE in [21], we describe the definition of 
keyed-fully homomorphic encryption (keyed-FHE) given by Lai et al. [30], except 
that the adversaries are allowed to access the decryption oracle until the homo- 
morphic evaluation key is revealed. 


Definition 10. A keyed-FHE scheme consists of four polynomial-time algo- 
rithms (KGen, Enc, Dec, Eval): For a security parameter A, let M = M(A) be 
a message space. 


- (pk, ska, skp) — KGen(1*): The randomized algorithm KGen takes as input a 
security parameter 1, and it outputs a public key pk, a decryption key ska, 
and a homomorphic evaluation key skp. 
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— ct — Enc(pk,m): The randomized algorithm Enc takes as input a public key 
pk and a message m E€ M, and it outputs a ciphertext ct. 

- m/L — Dec(ska, ct): The deterministic algorithm Dec takes as input a decryp- 
tion key ską and a ciphertext ct, and it outputs a message m or a rejection 
symbol L. 

- &/L — Eval(skn, C, (ct, ct®,...,ct)): The deterministic or randomized 
algorithm Eval takes as input a homomorphic evaluation key skp, a circuit 
C: M! — M, and a tuple of ciphertexts (ct, ct@),...,ct), and it outputs 
a new ciphertext ct or a rejection symbol L. 


We require that a keyed-F HE scheme meet both correctness and compactness. 


Definition 11 (Correctness). A keyed-FHE scheme (KGen, Enc, Dec, Eval) 
satisfies correctness if the following conditions hold: 


— For every (pk,ska,skn) — KGen(1*) and every m € M, it holds that 
Dec(skg, ct) = m with overwhelming probability, where ct — Enc(pk, m). 

- For every (pk, ska,sk,) — KGen(1*), every circuit C: ME > M, and every 
(mM, ...,m) € M£, it holds that Dec(skg,ct) = C(m™,...,m) with 
overwhelming probability, where ct — Eval(skn,C, (ct™,...,ct)) and for 
every i € [4], ct® — Enc(pk, m™). 


Definition 12 (Compactness). A keyed-FHE scheme satisfies compactness if 
there exists a polynomial poly such that the output-size of Eval(skp,,-,-,-) is at 
most poly(A) for every security parameter À. 


Definition 13 (KH-CCA security). A keyed-FHE scheme Ixpue = (KGen, 
Enc, Dec, Eval) is KH-CCA secure if for any PPT adversary A = (Ao, A1) against 
ITKFHE; the advantage 


(pk, ska, skn) — KGen(1*); 
(mo, mi, st) ee J ea RevHK(),Deelskas:) Ck: 


$ i :b= b' aa 
b — {0,1}; ct* — Enc(pk, mo); 
be AE ee ERAT NS a) ( 


Advice a (A) := [Pr 
ct*, st) 


is negligible in A, where st is state information, and let D be a list which is set 
as D — {ct*} in Challenge phase, and the oracles above are defined as follows: 


- Homomorphic key reveal oracle RevHK: Given a request, the RevHK oracle 
returns skp. 

- Evaluation oracle Eval(skp,-): Given an Eval query (C, (ct™,...,ct©)), the 
Eval oracle checks whether the RevHK oracle has been queried before. If so, 
it returns L. Otherwise, it returns &/L — Eval(skn, C, (ct™,...,ct)). In 
addition, if ct # L and one of ciphertexts ct™,...,ct© is in D, it sets 
D—Dv {et}. 

- Decryption oracle Dec(ska,-): This oracle is not available if A has accessed the 
RevHK oracle and obtained the challenge ciphertext ct*. Given a Dec query 
ct, the Dec oracle returns Dec(skg, ct) if ct ¢ D, and returns L otherwise. 
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3 Generic Construction of Keyed-FHE 


Our Construction. We propose a generic construction of a keyed-FHE scheme 
IIkrue from two IND-CCA1 secure FHE schemes fue, Urue,2 and a (strong) 
DSS-NIZK system Jpn. We briefly explain an overview of the construction whose 
spirit is similar to Jutla and Roy’s KH-PKE scheme [29] except that we use 
the Naor-Yung paradigm [36]. Let (pk,,sk;) and (pk,,sk2) denote two pairs of 
public/secret keys of Hre, and Hrne p. A public key pk = (pk,, pka, crs) of 
IIkrue consists of two public keys (pk,, pk.) of schemes Hrpe,1, Hrne, and the 
CRS crs of [[pn, while the secret key ską = sk; is the secret key of [FHE1. The 
ciphertext ct = (ct1,ctg,7) consists of two FHE ciphertexts (ct1,ct2) both of 
which are encryptions of m and 7 is a proof such that (ct,,ctz) are encryptions 
of the same message. The decryption algorithm first checks the validity of m by 
using the real world verification algorithm Vy, then decrypt ct; by using ską = 
skı. To complete the overview, we show how to evaluate keyed-FHE ciphertexts 
ct... ,ct® for a circuit C and obtain ct. A point to note is that we should 
create a proof 7 without the knowledge of the message C(m™,..., m() of ct. For 
this purpose, we use the DSS-NIZK system Jpn in partial-simulation world as 
the case of Jutla and Roy’s KH-PKE scheme [29]. Then, we set the homomorphic 
evaluation key skp = td, as the trapdoor of pn. Therefore, the (composable) 
partial zero-knowledge ensures that 7 can be computed correctly by using the 
sfSimy algorithm. Here, we note that the verification algorithm Vy can correctly 
verify the proof created by the sfSimy algorithm owing to partial zero-knowledge. 

To sum up, we use the following primitives: An FHE scheme Hpne, = 
(KGenr;, Encr;, Decr;, Evalr;) for i € {1,2}, and a DSS-NIZK system 
Ion in partial-simulation world (sfGeny,sfSimy,pV,) for a relation Ry = 
{(cti, ct2)(m, 71, 72) | cty = Encri(pk,,m;71) A cto = Encr2(pky,m;r2)}, where 
(pk,,sk,) — KGenp (1) and (pky,sk2) — KGenpo(1*). We also remark that 
a proof generated by the sfSimy algorithm can be verified by the real world 
verification algorithm Vy owing to the partial zero-knowledge property. Thus, we 
use the Vy algorithm in our construction. 

Our scheme [krHe = (KGen, Enc, Dec, Eval) is constructed as follows: 


— (pk, skg,sk,) — KGen(1*): 
1. (pk,,ski) — KGenp1(1), (pk, sk2) — KGenpo(1*). 
2. (crs, tds, td,,) — sfGeny (1°). 
3. Output pk = (pk,, pks, crs), ska = ski, and sk, = tds. 
— ct — Enc(pk, m): 
1. cty — Encp (pk, m; r1), ct2 — Encpr (pko, m; r2). 
2. n — Py(ers, (cti, ct2), (m, r1, r2), 9). 
3. Output ct = (ct1,ct2, 7). 
— m/L e Dec(ska, ct): Let ct = (ct1, ct2, T). 
1. If Vy (crs, (ct1,ct2), 7,0) = 1, output m — Decri(ski, ct). Otherwise, 
output L. 
- &/L — Eval(skn, C, (ct,...,ct)): Let ct = (t, ct, r) for i € [4. 
1. Output L if Vy (crs, (f, ct), r, 0) = 0 for some i € [4]. 
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2. ct; — Evalpi(C, (ct, ibe ,ct?)), cty — Evalo(C, (fP, say t$). 
3. T — sfSimy (crs, tds, (Cty, Cte), 1,0). 
4. Output ct = (ct1,Ct2, T). 


The correctness of IkrHe follows the correctness of Irne, and Hrne, and 
the completeness of pn. The first condition of the correctness holds since the 
completeness of [pyn ensures that Vy outputs 1 and the correctness of IfHe,1 
ensures that Decpı correctly outputs m with overwhelming probability. Sim- 
ilarly, the second condition of the correctness also holds since the composable 
partial zero-knowledge of Jpn ensures that Vy outputs 1 even if the proof 7 is 
computed by the semi-functional simulator sfSimy. In addition, the output-size 
of sfSimy used in Eval is equal to that of Py since the semi-functional simula- 
tor sfSimy simulates the prover Py. Thus, the compactness of [krfue follows the 
compactness of Jppe,ı and rye. 


Remark 1. Canetti et al. [11] showed that IND-CCA1 secure FHE can be con- 
structed from IND-CPA secure FHE and zk-SNARK via the Naor-Yung trans- 
formation. Here, circuit C to be evaluated is a witness and thus the underlying 
NIZK system needs to be succinct. On the other hand, in our evaluation algo- 
rithm first ciphertexts are evaluated by the evaluation algorithm of the underly- 
ing IND-CCA1 secure FHE schemes, and then the underlying NIZK system proves 
that two ciphertexts ct, and Cty have the same plaintext using the trapdoor. So, 
C is not a witness here, and we do not have to directly employ zk-SNARK in 
our construction. 


Security Analysis 


Theorem 1 (KH-CCA security). If both [pues and Urpye2 are IND-CCA1 
secure, and IIpn is a strong DSS-NIZK system, then the resulting keyed-FHE 
scheme Ikrue is KH-CCA secure. 


Theorem 1 shows the security of our keyed-FHE scheme. The proof of this 
theorem appears in the full version of this paper. For simplicity, we explain that 
our scheme satisfies KH-CCA security if the underlying NIZK system [py meets 
the properties of strong DSS-NIZKs. We first give the intuitive explanation. To 
guarantee security against adaptive chosen ciphertext attacks before a homomor- 
phic evaluation key (a trapdoor of Jpn) is revealed by RevHK oracle access, the 
underlying DSS-NIZK system must satisfy (one-time) simulation-soundness so 
that we can return the non-malleable challenge ciphertext correctly. In addition, 
if the ciphertexts generated by the evaluation oracle are malleable, it is possible 
to break KH-CCA security by querying such ciphertexts to the decryption oracle. 
Thus, unbounded (partial) simulation-soundness is required for Jpn in order to 
return non-malleable ciphertexts for evaluation queries. Moreover, our scheme 
needs the partial zero-knowledge and one-time full zero-knowledge properties of 
strong DSS-NIZKs, so that the challenge message can be hidden even if a sim- 
ulation trapdoor of [py is revealed. Since we can assume that the underlying 
FHE schemes are IND-CCA1 secure, we can simulate decryption queries until the 
challenge phase. 
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Remark 2. Although we assume that the underlying FHE schemes are IND-CCA1 
secure in Theorem 1, we can prove KH-CCA security even when the underlying 
FHE schemes are IND-CPA secure. For this purpose, we follow Canetti et al. 
generic construction [11] and additionally use a zk-SNARK. Nevertheless, we 
assume IND-CCA1 security of the underlying FHE schemes since it enables us to 
obtain a much simpler proof. 


Table 1. Summary of Games in the Proof of Theorem 1 


C(mi,...,me) 
Game | Components of ct* computed for Verification of | Msg-Rec. of 
ct3 7 Dep. Eval Indep. Eval Dec Dec 
Gameo | Encr.2(m,) P% Ordinary Vn Vn Decr, 
Game: | Encr,2(m) — sfSimiy Ordinary PVN PVN Decr 1 
Gamez) Encr,2(m) sfSimý Random pV vy PVN Decr 1 
Games | Encr2(m,) otfSimġ Random sfV yN sfV yN Decri 
Games Encr2(0!™!) otfSimý Random sfVn sfVn Decr, 
Games | Encr,2(0!™!) otfSim% Random sfV N sfV y Decr2 
“C(m,,...,mg) computed for Dep. Eval” denotes a message C(m;,..., mz) for ct 
generated by the Eval oracle on input a dependent Eval query. “Ordinary” (resp. 
“Random” ) means that C(mi,...,mz) is a message whose encryption is generated 


by the Eval algorithm on input encryptions queried by the adversary A (resp. 
encryptions of random messages). “Verification of Indep. Eval” denotes a verifi- 
cation algorithm in the Eval algorithm run by the Eval oracle on input an inde- 
pendent Eval query. “Verification of Dec” denotes a verification algorithm in the 
Dec algorithm run by the Dec oracle on input a Dec query. “Msg-Rec. of Dec” 
denotes an algorithm which recovers a message in the Dec algorithm run by Dec 
oracle on input a Dec query. For i € {1,2}, let Encri(-) = Encr;(pk,,-) and 
Decr,i(-) = Decri(ski,-). Let Py = Pw(ers, (cti, ct), (mo, ri, r3), Ø), sfSimy = 
sfSimw (crs, tds, (ct], ct), 1,0), and otfSimy = otfSimy (crs, tds,1, (ct], cts), Ø). 


Next, we give the more concrete explanation. Let a dependent Eval query be 
a query (C,(ct™,...,ct()) issued to the Eval oracle, such that at least one of 
ct... ,ct© are in D, and let an independent Eval query be a query issued to the 
Eval oracle, such that all ct, ...,ct© are not in D. In order to prove Theorem 1, 
we consider security games Gameg,...,Games (Table 1 shows the summary of 
these games). The proof of the indistinguishability between Gameg and Games 
is similar to a part of the security proof of the Jutla and Roy’s scheme [29] 
because this indistinguishability mainly follows the properties of the underlying 
strong DSS-NIZK (see also Table 2). The remaining proofs are different from the 
security proof of [29], because our scheme employs the Naor- Yung paradigm while 
the Jutla and Roy’s scheme uses a variant of ElGamal encryption. Furthermore, 
we describe the important point of our security proof. In Gamey, the challenge 
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ciphertext is replaced by an invalid one due to a reduction from the security of 
the underlying primitives, in the same way as the security proof of the Naor- 
Yung paradigm [36]. However, when an adversary issues the challenge ciphertext 
(or derivatives of the challenge ciphertext) to the Eval oracle, this oracle must 
return a valid ciphertext. In order to simulate the Eval oracle correctly even in 
this case, the Eval oracle on input a dependent Eval query returns a random and 
valid ciphertext instead of an ordinary evaluated ciphertext, in Gamez. If Gamez 
is indistinguishable from the previous game, it is possible to replace the ordinary 
challenge ciphertext by an invalid one in the security games after Gamez. 


Table 2. Outline of the Proof of Theorem 1 


Game Property 


Gameo ~% Gamez partial zero-knowledge of Jpn, 
true simulation-soundness of Jpn 


Game, ~ Gamez one-time full zero-knowledge of Jpn, 
unbounded partial simulation-soundness of Ipon, 

IND-CCA1 security of fue. and ITFHE,2 

Gamez ~ Games one-time full zero-knowledge of Ton, 

simulation-soundness of sfV y 


Games ~ Games simulation-soundness of sfV in, 
IND-CCA1 security of ITFHe,2 
Games ~ Games one-time full zero-knowledge of Ipon, 
unbounded partial simulation-soundness of Jpn 
Games IND-CCA1 security of Ire, 


4 Strong DSS-NIZK from Smooth PHPS and Unbounded 
Simulation-Sound NIZK 


In this section, we show that there exists a strong DSS-NIZK system for NP, 
constructed from a smooth PHPS and an unbounded simulation-sound NIZK. 
Although our construction is similar to the generic construction [29] of strong 
DSS-NIZKs for linear subspaces, the properties of the underlying primitives are 
different from those of the primitives used in ours. As mentioned in Sect. 1.2, the 
previous construction assumes the underlying PHPS to be universal and uses a 
true simulation-sound quasi-adaptive NIZK while we assume that the underlying 
PHPS do not have to satisfy universalg, and the underlying NIZK satisfies the 
unbounded simulation-soundness (Definition 2). 

Furthermore, we modify the generic construction [29] under our assumption, 
slightly. This is because the languages of existing PHPSs for lattice-based cipher- 
texts are not necessarily identical to those of existing unbounded simulation- 
sound NIZKs based on lattice assumptions. 

Following [17], we define smooth PHPSs to describe our DSS-NIZK scheme. 
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Definition 14 (Projective Hash Family [{17]). Let X and II be finite sets. 
Let H = {Hk }kex be a collection of functions indexed by K so that Hg : X — IT 
is a hash function for every k € K. Then, (H, K,X, II) is called a hash family. 
Let L be a non-empty proper subset of X. Let S be a finite set, anda: K — S 
be a function. H = (H, K, X, II, L,S,a) is called a projective hash family (PHF) 
if for every k € K, the action of Hy, on L is determined by a(k). 


Definition 15 ((Smooth) Projective Hash Proof System [17]). For 
languages defined by a relation R C {0,1}* x {0,1}*, the PHF H = 
(H, K, X, II, L, S,a) constitutes a projective hash proof system (PHPS) if a, Hp, 
and a public suaiaton function É are efficiently computable, where FH takes as 
input the projection key a(k), a statement x € L = L(R) = {x | Iw s.t. (a, w) € 
R}, and a witness w such that (x,w) € R, and it computes Hp(£). 

Furthermore, a PHPS constituted by a PHF H = (H,K,X,,L,S,a) is 
called a labeled PHPS if the public evaluation function takes an additional input 
Ibl € {0,1}* which is called a label. A labeled PHPS is e-smooth if the statistical 
distance between U(H) = (a, a(k), 7’) and V(H) = (a2,a(k), Hy(a,Ibl)) is at 
most € for allk € K, alla € X\L, all Ibl € {0,1}*, and all x’ € II. 


In order to construct our DSS-NIZK system py, we assume that the follow- 
ing primitives are used: An e-smooth labeled PHPS [Tpyps with a public evalu- 
ation function H, which is constituted by a PHF H = (H, K, Xu, Lu, I, S, a), 
and a NIZK system Hy = (Geny,Pwx,Vw) for an augmented relation Ry = 
{((x, £H, TH, lbl), (w,wą)) | (x,w) E€ RA Ttg = H(a(k), (xy, x||Ibl), wz)}, with 
a PPT simulator (Simy,o,Simy,1) (where R C X x W is the relation of py). 

In addition, we assume that there exist polynomial-time algorithms EF), Fo, 
E3, G, and Eg, which are defined as follows: Æ samples auxiliary information w 
of R, which can be regarded as witness of R, Ez given w decides whether z is in 
L(R), Ez samples a uniformly random value from J, and we write (£g; wg) — 
(G||Eg)(x, Ibl; w) when G given (x,lbl) € X x {0,1}* outputs zq € Xy (then, 
we write zy + G(x,lbl)), and Eg given w outputs a witness wy by using the 
internal information of G (x, Ibl). (G ||Eg)(x, Ibl; w) outputs (£; wą) such that xy 
is in the language Ly of pups (and (xy, wy) is in the relation Ry of pups) 
if x is in £(R), but zy is not in Ly (and (xy, wg) ¢ Ry) otherwise. 

Furthermore, there is a gap between the two languages £(R) and Ly (e.g., 
L(R) C Ly) in general. This may be a problem to construct G. Thus, we assume 
that a statement x is publicly verifiable for a language Lx such that £(R) = 
Ly Lx. 

We explain that assuming the algorithms E1, E2, E3, G, Eg, and the public 
verifiability for Lx is reasonable. The algorithms E1, E2, and E3 are the same 
as the ones assumed in the DSS-NIZK construction of [29]. Thus, we explain 
that the remaining assumptions are reasonable in some cases (in particular, a 
case where we apply our DSS-NIZK to our keyed-FHE scheme). For example, 
we consider the language of the PHPS of [2], which can be simply defined as 
Ly = {ct | dw, Encpk(0; w) = ct}, where Encp,(-) is an encryption algorithm of 
public key encryption. In addition, we suppose that this public key encryption 
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scheme for Ly is an IND-CCAI1 secure FHE scheme from IND-CPA secure FHE 
schemes and a zk-SNARK [11]. Let Lx be the language for the zk-SNARK used 
in this IND-CCA1 secure FHE scheme [11]. First, assuming the public verifiability 
for Lx is reasonable because the FHE scheme [11] is based on the Naor-Yung 
paradigm, and it is clear that the ciphertexts are publicly verifiable for Lx. Next, 
we show that assuming G algorithm is reasonable. G checks whether two FHE 
ciphertexts are in Dx. If so, G transforms this pair into a statement in Ly by 
using the technique of the generic construction [34] of multi-key FHE, starting 
from an FHE scheme.* Otherwise, it samples x ¢ Ly and outputs this. Hence, 
if two ciphertexts are in £(R), then this pair is also in Ly. Otherwise, it is 
not in Ly due to the public verifiability of the IND-CCA1 secure FHE scheme. 
Hence, the algorithm G fulfills the required property. Accordingly, there exits an 
algorithm which generates the corresponding witness by using the algorithm of 
this transformation. Hence, there exist algorithms G and Eg. 
Our DSS-NIZK system Jpn for a relation R is described as follows: 


Real World consists of 

— crs — Gen(1*): Sample k È K and compute crsy — Geny (À). Output 
crs = (a(k), crsy). 

-m + P(crs,x,w,lbl): Compute (tH; wH) — (G||Eg)(x,lbl; w), rg — 
H(a(k), (xy, zx||lbl), wg) and ny — Py(crsy(z, £H, TH, lbl)(w,wą)). 
Output 7 = (£LH, TH, TN) 

— 1/0 — V(crs, 2,7, Ibl): Output 1 if Vy(crsy, (£, £H, TH, lbl), ry) = 1. 
Output 0 otherwise. 

Partial Simulation World consists of 

— (crs,tds,tdp) — sfGen(1*): Sample ~ by using E1. Sample k È K and 
compute (crsy,tdv) — Simyo(1*). Output crs = (a(k),crsw), tds = 
(k,tdy), and td, = (4%, k). 

— m — sfSim(crs, tds, x, 3, Ibl): 

e If 8 = 1, then compute ry + G(a,lbl), me — Ay(xH,2|\Ibl) and 
TN <— Simy,1(crsy,tdy, (2, 2H,7H, Ibl)). 
e If 8 = 0, then sample TH Èn by using £3 and compute tH — 
G(x, |bl) and my + Simy 1 (crsy,tdy (£, 2H, 7H, Ibl)). 
Output 7 = (@4,7H,7N). 

— 1/0 — pV(crs, td,, x, 7, Ibl): Output 1 if it holds that x € L(Ry) by using 
Ez given Y, Hy (eH, 2||lbl) = my, and Vy(crsy, (2, £H, Tg, lbl), ty) = 1. 
Output 0 otherwise. 

One-time Full Simulation World consists of 

— (crs, tds,tds1,td,) <— otfGen(1*): Sample k S K and compute 
(crsy,tdy) = Simy,o(1°). Output crs = (a(k),crsy), tds = tds: = 
(k,tdy), and td, = k. 


t Concretely, two FHE ciphertexts Enc(pk,,mi) and Enc(pk,,m2) can be trans- 
formed into a ciphertext Enc(pk;, Enc(pka,mı — m2)). If for two FHE ciphertexts 
Enc(pk,,mi;71) and Enc(pkz, m2; r2), (m, r1, r2) where m = mı = mz is a witness of 
the Naor-Yung language, then Enc(pk,, Enc(pka, mı — m2)) is a statement in Ly. 
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-m + otfSim(crs, tds 1, x,lbl): Compute tq < G(a,lbl), my <— 
H;(xy,x||ibl), and my — Simyi(crsy,tdy, (x, £H, 7H, Ibl)). Output 7 = 
(£H,TH, TN). 

— 1/0 — sfV (crs, td,, x, m, Ibl): Output 1 if it holds that H(z, x||lbl) = mH 
and Vy(crsy (z, £H, TH,lbl), my) = 1. Output 0 otherwise. 


Theorem 2. If Ippps is e-smooth, and Iy is an unbounded simulation-sound 
NIZK, then the resulting NIZK system Ipy is a strong DSS-NIZK system. 


Theorem 2 shows the properties of IMpn. The proof of this theorem appears 
in the full version of this paper. The overview of our proof is as follows: The 
partial zero-knowledge and unbounded partial simulation-soundness of pyn can be 
proven in the same way as the proof of [29]. In the one-time full zero-knowledge 
game, an adversary is allowed to submit (2*, 8*,lbl*) such that 2* ¢ L(R) in 
order to get a proof 7* generated by sfSim or otfSim. The difference between pV 
and sfV is the verification of x € £(R) with Ey. Thus, the outputs of pV and 
sfV may be different if the adversary issues (2,77, lbl) to the given verifier oracle, 
such that x ¢ L(R), (x, 7, Ibl) 4 (a*, 7*, IbI*), and the verifier oracle accepts. In 
the proof of [29], it is proven that this event does not occur due to the universal 
property of [7pyps and a special property of the underlying NIZK. In our proof, 
the event occurs with negligible probability, due to the unbounded simulation- 
soundness of Definition 2. This is because ((x*, #7, m}, Ibl“), 7*) is included in the 
list Q of the unbounded simulation-soundness game of Jy, and issuing the query 
above (x, 7 = (£H, TH, Ty), lbl) corresponds to the adversary’s winning condition 
in Definition 2 (i.e., (a, ay,7y,lbl) ¢ L(Ry), ((£z,£H,Tg,lbl), nry) ¢ Q, and 
Va(crsny,(2,¢H,7H,lbl), ny) = 1). Therefore, Zpyps does not need to satisfy 
universal property, and J7y must fulfill the unbounded simulation-soundness. 


5 Feasibility of Our Construction 


We show that a keyed-FHE scheme without iO can be constructed from existing 
schemes. For the FHE used in our generic construction, IND-CCAI1 security is 
required. However, our generic construction of strong DSS-NIZKs requires not 
only IND-CCA1 security but also public verifiability of ciphertexts (see Sect. 4). 
Canetti et al. [11] proposed generic constructions of IND-CCA1 secure FHE. 
They employed the Naor-Yung paradigm [36] with two IND-CPA secure FHE 
schemes and zk-SNARK [3,4]. This construction satisfies both IND-CCA1 secu- 
rity and public verifiability of ciphertexts, since it is possible to check the valid- 
ity of ciphertexts owing to the public verifiability of the underlying zk-SNARK. 
Although they also showed that multi-key IBFHE can be used for constructing 
IND-CCA1 secure FHE, this IND-CCA1 secure scheme does not necessarily satisfy 
public verifiability. Thus, we cannot apply this one to our generic construction 
of keyed-FHE. Although a generic construction of IND-CCA1 secure FHE from 
iO was also proposed in [11], we emphasis that no iO is required for constructing 
IND-CCA1 secure FHE from the viewpoint of feasibility. 
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The remaining part is strong DSS-NIZK. As described in Sect. 1.2, NIZKs 
used to obtain a strong DSS-NIZK for NP can be constructed from X- 
protocols [26] by using the Fiat-Shamir transformation [23], and there exists such 
a NIZK in the quantum random oracle model [12] or the standard model [31]. 
There exists a smooth (approximate) PHPS [2] for lattice-based ciphertexts. 
Hence, we can obtain a strong DSS-NIZK for NP by using existing schemes. 
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Abstract. In a broadcast encryption system, a sender can encrypt a 
message for any subset of users who are listening on a broadcast chan- 
nel. The goal is to leverage the broadcasting structure to achieve better 
efficiency than individually encrypting to each user; in particular, reduc- 
ing the ciphertext size required to transmit securely, although other fac- 
tors such as public and private key size and the time to execute setup, 
encryption and decryption are also important. 

In this work, we conduct a detailed performance evaluation of eleven 
public-key, pairing-based broadcast encryption schemes offering different 
features and security guarantees, including public-key, identity-based, 
traitor-tracing, private linear and augmented systems. We implemented 
each system using the MCL Java pairings library, reworking some of 
the constructions to achieve better efficiency. We tested their perfor- 
mance on a variety of parameter choices, resulting in hundreds of data 
points to compare, with some interesting results from the classic Boneh- 
Gentry-Waters scheme (CRYPTO 2005) to Zhandry’s recent generalized 
scheme (CRYPTO 2020), and more. We combine this performance data 
with data we collected on practical usage scenarios to determine which 
schemes are likely to perform best for certain applications, such as video 
streaming services, online gaming, live sports betting and distributor- 
limited applications. This work can inform both practitioners and future 
cryptographic designers in this area. 


1 Introduction 


In a broadcast encryption system [9], a sender can encrypt a message for any 
subset of users who are listening on a broadcast channel. We focus on public-key 
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systems, where there is a public system key that allows anyone to encrypt a 
message to any set S of his choice out of an established set of N users. The 
public system key is established by a master authority, who also distributes 
individualized secret keys to each user in the system. If a user is in the set S 
for a particular broadcast, then she can decrypt that broadcast using her secret 
key. A critical security property for these systems is collusion resistance, which 
guarantees that users not in S learn nothing about the broadcast message. Some 
schemes offer a traitor-tracing functionality that protects against digital piracy; 
specifically, it guarantees that if one or more malicious users work together to 
release piracy information (e.g., software or a key) that decrypts on the broadcast 
channel, then this piracy information can be traced back to at least one of them. 

The goal of broadcast encryption is efficiency. In particular, the goal is to 
leverage the broadcasting structure to achieve better efficiency than individually 
encrypting to each user. This can result in huge practical savings. To measure the 
concrete performance benefits offered by various broadcast encryption systems, 
for several different sizes of system users N and encryption subsets S C N, we 
will compare each broadcast encryption scheme in terms of ciphertext size, pub- 
lic and private key size, and the setup, key generation, encryption and decryp- 
tion times. We focus on pairing-based broadcast systems, since this is the most 
promising algebraic setting for reducing ciphertext size and obtaining fast run- 
times (see [4] for more on pairings). We also compare the broadcast schemes 
to an optimized “baseline” scheme! derived from ElGamal encryption [11] with 
shared parameters (see Sect. 2) that individually encrypts to each user in the 
broadcast set S. 


Our Contributions and Results. To the best of our knowledge, this work is the 
only current detailed performance study of public-key, pairing-based broadcast 
encryption systems. Although schemes can be loosely grouped and compared at 
the asymptotic level for performance purposes, the tradeoffs, underlying con- 
stant factors, scalability and differing system features could significantly impact 
various applications. To provide the community with a solid foundation for com- 
parisons, this work includes the following: 


— We collected eleven public-key, pairing-based broadcast encryption systems, 
which are detailed in Table2 of Sect.3 and which we thought were likely 
to perform the best. In some cases, we made efficiency-focused alterations 
to the schemes, such as creating a separate setup and key generation func- 
tion or finding the most efficient asymmetric pairing implementation for a 
scheme presented symmetrically. Any change from the original publication is 
documented herein, with details in the full version [4]. 

— We implemented the eleven broadcast systems using the MCL pairings library 
(currently employed by some cutting-edge cryptocurrency companies) and 
the baseline ElGamal system using OpenSSL. These implementations will be 
made publicly available. We ran hundreds of tests on these systems for various 


1 Because the Sect. 2 baseline scheme will not require the pairing operation, it is imple- 
mented using an elliptic curve group, whose elements are even smaller; thus requiring 
real performance gains from the broadcast systems to overtake it. 
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parameter choices, reporting on those results in Sect. 3. This is a contribution 
in terms of providing the community with data and public reference imple- 
mentations; additionally careful implementation is also important for rooting 
out any potential issues in prior publications. In the course of our study, we 
discovered a technical issue in a prior publication; we communicated it to the 
author(s) and they updated their scheme accordingly. (Details are removed 
for submission anonymity, but we will be explicit in the final version.) Thus, 
this implementation effort has also been useful as an additional verification 
process for prior work. 

In Tables7 and 8, we document that individual encryption is more efficient 
than broadcast encryption for systems with 100 users or less. The 100 < N < 
10, 000 range is a gray area where there are tradeoffs to be made. But once a 
system’s users exceed 10,000, broadcast encryption dominates the individual 
encryption (baseline) in overall performance. 

To understand which broadcast system offers the “best” performance, we 
researched the yearly reports and shareholder letters of companies such as 
Nvidia [20], Disney [5] and Netflix [6] to understand the performance demands 
of some interesting applications for broadcast encryption. We summarize our 
findings in Sect.4. We start with the classic application of video streaming 
and then explore the emerging applications of online gaming, live sports bet- 
ting and more. In a nutshell, if traitor tracing is not required, we found that 
the classic Boneh-Gentry-Waters system [2, S3.2] provides the best tradeoffs 
for video streaming and is strong for online gaming too, with [23] also strong 
for gaming. Zhandry’s generalized nonrisky system (see [4] for details) can 
be tuned to optimize a parameter of interest (e.g., ciphertext size), although 
this usually results in another parameter (e.g., decryption time) becoming 
infeasible. For live sports betting, the smaller number of users and the impor- 
tance of encryption speed make Gentry-Waters [15] the preferred choice. We 
found the private tracing system of Gentry, Kumarasubramanian, Sahai and 
Waters [12] to provide the best overall system performance when tracing is 
needed, but it may not be fast enough for live streaming applications. For 
peer-to-many applications, we favored Boneh-Gentry-Waters [2, $3.1] when 
many keys must be generated. Finally, we discovered that none of the identity- 
based broadcast systems (IBBE) were practical for large user applications, so 
a practical IBBE remains an interesting open research problem. 


The schemes we implemented (as taken from their respective publica- 


tions), including the ElGamal baseline, achieve security against chosen plaintext 


attacks [16] (CPA), while NIST recommends that deployed systems achieve a 


stronger notion of security against chosen ciphertext attacks [8,19,21] (CCA). 
While efficient general transformations from CPA to CCA exist for public key 
encryption [10], it is not clear if these can be applied to broadcast encryption 
systems without compromising some of their functionality (e.g., traitor tracing). 


This is an exciting area for future research. 
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We believe this timely implementation study will inform practitioners as 
they look to harness the performance savings of broadcast encryption, while 
also providing context for future broadcast encryption designs. 


2 An ElGamal Baseline and Other Related Works 


We construct a baseline system, which encrypts the same message individually to 
each privileged user, so that we can contrast its performance with the broadcast 
systems. This CPA-secure system uses ElGamal encryption [11] with shared 
parameters. Private key sizes are constant, but the public key size grows linearly 
with the number of system users. 


Setup(A, N): Let G be an elliptic curve group of prime order p for which the 
DDH assumption holds true. Pick a random generator g € G. For i = 1,2,...,n 
pick a random zx; E€ Zp and compute h; = g™'. The master public key is PK = 
(g, hi,..-, hn) and the private key for user i is x;. Output the public key PK and 
the N private keys 71, £2,..., EN. 


Enc(PK, S, m): To encrypt a message m to a set of users S, first pick a random 
y € Zp. For each user i € S, compute z; = (h;)” +m. Output the ciphertext 
CT = (g¥, 21,---, 2g): 


Dec(i, CT): Parse the ciphertext as CT = (c, 21,...,2g5). User i decrypts by 
computing m = z; / (c+). 

We implement this baseline scheme and tested it in OpenSSL over the curves 
NIST P-192, NIST P-224, NIST P-256, NIST P-384, and NIST P-521. Based on 
the results from our implementation, we use the results over the curve NIST P- 
256 as a basis of comparison to the pairing-based scheme runtimes over BN254. 
The runtimes are the fastest over this curve and the 128 bit security provided 
by NIST P-256 is very close to the 110 bit security provided by BN254. 


Table 1. Runtimes for the ElGamal baseline over different curves when N = 100K 
and |S| = 10K. Let s denote seconds and ms denote milliseconds. 


Curve Security | Setup Time | Encrypt Time | Decrypt Time 
NIST P-192| 96 bits 39.27 s 3.78s 0.51 ms 
NIST P-224| 112 bits 5.51s 515.13 ms 0.09 ms 
NIST P-256 | 128 bits 2.40s 248.08 ms 0.05 ms 
NIST P-384| 192 bits 145.82 s 14.66 s 2.67 ms 
NIST P-521 | 260.5 bits 37.048 3.62s 0.73 ms 


Additional related works are discussed in the full version [4]. 
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3 Broadcast Encryption Implementations and Analysis 


We refer to the following for the definitions of broadcast encryption [9], and the 
identity-based [7,22], trace-and-revoke [18], augmented [1], traitor-tracing [3,17], 
and private-linear [3] variants. 

We provide a reference implementation? and comparison of eleven broadcast 
encryption systems. All schemes are implemented in the asymmetric pairing 
setting using the MCL pairings library? with a Barreto-Nachrig BN254 curve. 
This curve is conjectured to have approximately 100-bit security. Group elements 
in G,, G2, and Gr occupy 32 bytes, 64 bytes, and 381 bytes of space in memory, 
respectively. Elements in Z, occupy 254 bits of space. We also compare all of the 
systems to an ElGamal baseline system in Sect.2, which was implemented using 
prime-order elliptic curve groups in OpenSSL since it does not require pairings. 
The baseline system was implemented using C++ but all the others are in Java. 
We chose Java for the pairing-based broadcast schemes because the MCL Java 
library possessed a remarkably simple, flexible software interface which allowed 
us to easily implement and compare these systems to each other. 

We compare the setup, encryption, key generation, and decryption times in 
each of our systems. The runtimes are tested by setting the size of the subset 
of privileged users S to be equal to some percent of the total number of users 
in the system. This ensures that the subset size scales with the number of users 
in the system. All of the runtimes were tested on a 2014 Macbook Air with a 
1.4GHz Dual-Core Intel Core i5 processor and 4 GB RAM. 

We also compare the sizes of the public key, private key, and ciphertext for 
each of the systems. Table2 shows how the sizes scale asymptotically. It also 
provides an overview of the systems that we implement. 


3.1 Boneh-Gentry-Waters Scheme Using Asymmetric Pairings 


The Boneh-Gentry-Waters-Scheme, [2], refers to a fully collusion resistant pub- 
lic key broadcast system for stateless receivers. In the paper, two schemes are 
described, and both are secure against static adversaries. In the “special case”, 
[2, S3.1], the public key grows linearly with the total number of users in the 
broadcast system. Ciphertext sizes and private key sizes are constant. In the 
general construction [2, S3.2], the public key and ciphertext are both of size 
O(A- VN), and private key sizes are constant. We rewrite both of these schemes 
using Type-III pairings, strategically placing certain group elements in G, and 
Gə to optimize the efficiency of our construction. We also add a KeyGen function 
instead of generating the private keys for all N users in the Setup phase. This 
facilitates the comparison of this scheme to other public-key broadcast encryp- 
tion schemes in Sect. 3.4. 

In Table3, we present the encryption times for the BGW special case con- 
struction for varying subset sizes. We define the subset size to be some percent 


? https: //github.com/ArushC /broadcast. 
3 https://github.com/herumi/mcl. 
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Table 2. A summary of pairing-based broadcast encryption systems. Let N be the 
number of users in the broadcast system, Z be the maximal size of the subset of users 
S such that |S| < £, and A be the security parameter. Note that for the broadcast and 
trace system [24, S9.3], a € [0, 1]. 


Scheme Type Ciphertext | Private Key | Public Key Security 
Size Size Size 
ElGamal baseline| Public Key | O(A- |S|) O(A O(A- N) CPA-secure 
[2, $3.1] Broadcast O(A) O(r O(A- N) static 
[2, S3.2] Broadcast | O(A- VN) O(A O(A- VN) static 
[23] Broadcast O(A) O(A- N) O(A- N) adaptive 
15, $3.1 Broadcast O(A) O(A- N) O(A- N) semi-static 
15, S41 IBBE O(A- £) O(A O(A- £) adaptive 
[15, $4.3.1] IBBE O(A) O(A O(A: £) semi-static 
14, $3.1 IBBE O(A) O(A- N) O(A- N) adaptive 
24, S9.3 Risky Trace | O(A- N) O(r O(A) adaptive 
13, S5.2 Te onam oe aoan || ee 
public tracing 
24, $9.3 Trace O(A: N1~*)) O(A- N*~*) | O(A- N°) adaptive 
[12] PLBE O(A- VN) O(A) O(A- VN) | private tracing 


of the total number of users in the system. Notice a general trend that as the 
subset size percentage increases, so does the encryption time. This is because 
even though the ciphertext sizes are constant, O(|S|) multiplications over G are 
required to compute the product v : Į Jjes(Bn+1-;) during encryption. 

We implement the general construction by setting B = |,/n| as the authors 
of [2] suggest. In this case, B is an arbitrary parameter that scales the public key 
and ciphertext to the desired size, and setting B to the specified value enables 
us to achieve the optimal public key and ciphertext sizes of O(A - VN). Again, 
we modify this system to include a KeyGen function instead of generating the 
private keys for all N users in the Setup phase. 

We notice that runtimes increase when we read the table from left to right. 
However, when we read the table from top to bottom, we see mixed results. To 
understand the context for the discussion that follows, we ask readers to refer 
to the bottom of page 7 of [2]. 

In the encryption algorithm, runtimes are determine by two significant steps. 
First is the computation of Sz, which is dominated by the number of operations 
required to calculate ĝe = S N {4B — B + 1,£B — B + 2,...,}B}. In order to 
compute the intersection of two sets, for each item in the latter set, the system 
must check if there is a corresponding item in S. Since we use hash sets to 
compute this intersection, the time that it takes to lookup an item in S is O(1). 
But we still need to iterate over each item in the original set of size B, so this 
step will take time O(B). The computation of the subset Sy is an intermediate 
step which dominates the computation of Se for £ = 1,2,...,A. Therefore, the 
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Table 3. Encryption times for the Boneh-Gentry-Waters Special Case scheme. Let s 
denote seconds and ms denote milliseconds. 


Subset Size Encryption time when number of system users N = 
100 1K 10K 100K 1M 

1% 1.24 ms | 0.88 ms | 1.29 ms | 6.08 ms 87.24 ms 
5% 0.71 ms | 1.31 ms | 2.16 ms | 14.06 ms | 340.73 ms 
10% 1.26 ms | 1.32 ms | 2.40 ms | 16.35 ms | 664.76 ms 
20% 1.51 ms | 1.69 ms | 3.27 ms | 30.89 ms 1.13s 
50% 0.75 ms | 2.19 ms | 8.60 ms | 67.59 ms 1.62s 


total time to compute all of these subsets is O(A - B) = O(N). After computing 
these subsets, there is a second step. The system still has to calculate the product 
vi- T]jeg,(he+i—;) for i € {1,2,..., A}. According to our implementation, this 
takes a total of |S| group multiplications. Hence, the overall time complexity for 
the encryption algorithm is given by O(N + à- |S|). The O(N) operations in 
computing the Sp subsets are individually much less costly than each of the |S] 
group multiplications, but they still influence runtimes to an extent. Reading 
the table from top to bottom, we keep the total number of users in the system 
N constant while increasing the subset size | S|. For the smaller values of |S] (i.e. 
N = 100, 1K, 10K), the slight increase in the value of |S] does not significantly 
affect runtimes to an extent that it can be explained by the big-O notation. But 
reading the table from left to right, we increase both |S] AND the value of N, 
which causes runtimes to increase as expected. 


Table 4. Encryption times for the Boneh-Gentry-Waters general scheme. Let ms 
denote milliseconds. 


Subset Size Encryption time when number of system users N = 

100 1k 10K 100K 1M 
1% 3.67 ms | 9.59 ms | 29.62 ms | 77.61 ms | 288.11 ms 
5% 3.62 ms | 8.14 ms | 24.68 ms | 83.09 ms | 297.83 ms 
10% 3.58 ms | 8.61 ms | 26.60 ms | 86.36 ms | 306.96 ms 
20% 2.77 ms | 6.90 ms | 30.41 ms | 134.18 ms | 548.08 ms 
50% 4.43 ms | 13.09 ms | 47.39 ms | 145.08 ms | 897.48 ms 


3.2 Gentry-Waters: A Semi-static Variant of the BGW System 


In [15], Gentry and Waters introduce the notion of semi-static security, which is 
between static security and adaptive security. They construct a semi-statically 
secure variant of [2]. In their system, the public key and private key both grow 
linearly with the total number of system users, but the ciphertext sizes are 
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constant. We implemented their semi-static scheme in the asymmetric (Type- 
III) pairing setting and optimized it for efficiency, with details in the full version. 
In Table 5, we give the encryption times. We notice a general trend that reading 
the table from left to right, encryption times increase. For the larger values of 
N, encryption times also generally increase when reading the table from top to 
bottom. In both of these scenarios, the subset size is increasing dramatically, 
which is why we see such a great increase in encryption times. Encryption time 
is dominated by the time required to calculate C2 = (J [jes hj)*, which requires 
O(|S|) group multiplications over Gy. 


Table 5. Encryption times for the Gentry-Waters semi-static variant of the Boneh- 
Gentry-Waters scheme. Let ms denote milliseconds. 


Subset Size Encryption time when number of system users N = 
100 1K 10K 100K 1M 

1% 0.64 ms | 0.59 ms | 1.51 ms | 1.97 ms 13.00 ms 
5% 0.65 ms | 1.11 ms | 1.53 ms | 6.21 ms 62.30 ms 
10% 1.18 ms | 1.37 ms | 1.99 ms | 12.63 ms | 153.56 ms 
20% 0.80 ms | 0.80 ms | 2.95 ms | 20.99 ms | 200.77 ms 
50% 1.04 ms | 1.59 ms | 5.46 ms | 65.02 ms | 486.97 ms 


3.3 Waters Dual System Broadcast Encryption System 


We implement a broadcast encryption system that is secure against adaptive 
adversaries, described in [23]. We remind readers that the adaptive security 
provided by this scheme is stronger than the static and semi-static security of 
the schemes implemented in Sects. 3.1 and 3.2, respectively. In this system, the 
ciphertext sizes are constant, but the public key and private key sizes grow 
linearly with the total number of system users. This system, like the others, 
was originally written in the symmetric pairing setting. In the full version, we 
describe how we implemented it in the Type-III pairing setting, strategically 
choosing which group elements to place in G; and G2 to maximize efficiency. 

In Table 6, we show the encryption times from our implementation, which 
are dominated by the computation of E, = ([[;eg ui)’ , which requires O(|5]) 
group multiplications over G4. 
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Table 6. Encryption times for the Waters Dual Broadcast System. Let ms denote 
milliseconds. 


Subset Size Encryption time when number of system users N = 
100 1K 10K 100K 1M 

1% 2.01 ms | 2.23 ms | 2.97 ms | 3.35 ms 14.22 ms 
5% 2.02 ms | 2.57 ms | 4.39 ms | 7.70 ms 73.20 ms 
10% 2.03 ms | 2.38 ms | 3.44 ms | 14.55 ms | 131.80 ms 
20% 2.06 ms | 3.26 ms | 4.49 ms | 22.90 ms | 239.53 ms 
50% 2.61 ms | 3.37 ms | 8.22 ms | 79.07 ms| 494.90 ms 


3.4 Comparison of General Broadcast Encryption Systems 


We now compare the broadcast encryption systems that we describe in Sects. 3.1, 
3.2, and 3.3 to each other, and to the baseline scheme which we describe in 
Sect. 2. We perform a runtime evaluation based on experimental values for setup, 
encryption, key generation (when applicable), and decryption. Based on these 
values, we then count individual operations and construct asymptotic runtime 
tables for each of the functions in each scheme. We only compare the runtimes 
based on the runtime tables that we construct in this paper, but we refer the 
reader to our implementation to view all of the runtimes. We also do a size 
evaluation based on the actual sizes of the group elements in G1, G2, and Gr over 
the curve BN254, given in Table 8. We have already given the theoretical sizes 
of the public key, private key, and ciphertext for each scheme at the beginning 
of this section in Table 2. 


Setup Times. We start by analyzing the setup times presented in Table 7. For 
all the pairing-based schemes, the setup phase requires computing a public key 
PK and master secret key MSK. Computing the master secret key takes a neg- 
ligible, constant amount of time, but the time that it takes to compute the public 
key varies. The baseline scheme setup phase requires O(N) exponentiations over 
the elliptic curve group G to calculate a linear-sized PK. On the other hand, [2, 
S3.2] requires O(N) exponentiations over G4 to calculate a public key of size 
O(A- VN). All the other pairing-based schemes require on the order of O(N) 
operations over G, to calculate a linear-sized public key. From our implementa- 
tion, individual group operations over the elliptic curve group G (NIST P-256) 
used for the baseline were found to be faster than operations over G, used in the 
pairings-based schemes. This explains why the baseline setup times are faster 
than those for all of the pairings-based schemes except [2, $3.2]. 

Setup times for [2, S3.2] appear to be faster than those for the baseline 
when N >= 1K, but not when N = 100. This is because when N = 100, the 
difference in the total number of exponentiations computed during setup for the 
baseline and [2, S3.2] is negligible. Hence, faster setup times for the baseline can 
be attributed the faster time for individual exponentiations over G compared to 
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G,. When N >= 1K, though, [2, 3.2] has faster setup times because calculating 
the public key requires much fewer exponentiations than for the baseline. Even 
though individual exponentiations are still faster over G in the baseline scheme, 
the sheer number of exponentiations required to calculate the public key has 
increased to an extent that it results in slower setup times. 


Encryption Times. On a first glance, it might seem surprising that the encryp- 
tion times for most of the pairing-based schemes appear to be consistently faster 
than those for the baseline. But then if we look at the baseline construction from 
Sect. 2, we notice that during encryption, we have to calculate z; = (h;)” -m for 
each i € S, in addition to g”. Overall, this takes |S] + 1 group exponentiations 
and |S] multiplications. Just like the baseline, ALL of the pairing-based schemes 
compute a part of the ciphertext with with O(|S|) group multiplications. But 
for all of the pairing-based schemes except [2, $3.2], the total number of expo- 
nentiations computed during encryption is constant. In [2, $3.1] and [15, 63.1], 
we only need one group exponentiation each time to compute Co = (g2)*. In 
[23], we have exactly six exponentiations over G, and size over G2 every time 
we compute the ciphertext. This makes the total time for encryption for these 
schemes less than that for the baseline as the value of N increases. We better 
explain the results in a series of observations: 


— When N = 100, the baseline encryption is the most efficient, even though 
it requires computing more group exponentiations than the other schemes. 
This is because the efficient group operations over the elliptic curve group 
G used for the baseline are outweighed by the slower group operations in 
the pairing-based schemes. However, the number of exponentiations required 
for the baseline encryption increases linearly with N. So when N >= 1K, 
despite the faster group operations over G, the number of exponentiations 
increases sharply for the baseline, while it stays constant for all the pairing- 
based schemes except [2, $3.2]. Hence, all of the pairing-based schemes except 
[2, S3.2] have faster encryption times when N >= 1K. 

— If we compare the baseline to [2, S3.2], we notice that [2, S3.2] is only more 
efficient than the baseline when N > 100K. We recall that encryption in [2, 
S3.2] requires a total of |S| group multiplications over G1, one exponentiation 
over Gg, and A exponentiations over G1, where A ~ VN. We also recall that 
in encryption for [2, $3.2], we need to compute Sy for £ € {1,2,..., A} by com- 
puting the intersection of integer subsets. This technically takes O(N) time to 
run, but since iterating over and adding integers to subsets is much faster than 
multiplying/exponentiating group elements, this step in encryption is fairly 
rapid. When N = 100, 1K, 10K, the baseline scheme’s faster encryption 
times can be attributed to the efficiency of group operations over the elliptic 
curve group G. Combined with the time to compute Sy for £ € {1,2,..., A}, 
the O(N) exponentiations in encryption for [2, S3.2] take longer to com- 
pute than the O(N) exponentiations for the baseline. This changes when 
N > 100K. Now, the sheer number of exponentiations required for the 
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baseline encryption has increased so greatly that the baseline encryption takes 
longer than that for [2, $3.2]. 

— When we compare the pairing-based schemes to each other, we see that [15, 
S3.1] consistently has the fastest encryption times. For the smaller values 
of N, the only other scheme that has encryption times nearly as fast is [2, 
S3.1]. As N grows larger, the encryption times for [23] grows closer to those 
for [15, S3.1] and [2, $3.1]. We recall that [15, S3.1] is actually a semi-static 
variant of [2, S3.1]. The encryption algorithms for both of these schemes are 
very similar. Hence the similar runtimes. What is significant, though, is that a 
semi-statically secure broadcast encryption system achieved faster encryption 
times that its static counterpart. So far, we see that the [15, S3.1] appears 
to be a well-performing system. It has the fastest encryption times, very fast 
setup times, and a moderately strong level of security. For adaptive security, 
[23] seems to be a very good option. The only downside to both of these 
schemes, as we will shortly see, is their large private key sizes. 


Key Generation Times. The baseline scheme, [2, S3.1], and [2, $3.1] are all 
secure against static adversaries. Private key sizes are constant, and therefore, 
single-user key generation times are constant. In order to achieve semi-static 
and adaptive security, though, the private key size must be expanded. The key 
generation algorithms for [15, S3.1] and [23] generate much larger private keys of 
size O(A- N). The problem that we found in our implementation is that these key 
generation algorithms take very long to run. In both [15, $3.1] and [23], it takes 
more than 1.5min to generate a key for a single user when the total number 
of system users N = 1M. For these two schemes, reading the key generation 
runtimes from left to right, we see that they increase linearly with the number 
of users in the system. This makes sense because it require O(N) operations 
over G; to generate a single user’s linear-sized private key. In [15, S3.1], we need 
N + 1 exponentiations over G; to calculate the private key for a single user. 
Key generation for [23] is similar, but slightly slower. In addition to the N +1 
exponentiations over G4, the system needs to calculate D1, Do,..., Dy. It would 
take up a lot of time and space to generate and store the private keys for a large 
subset of privileged users. The keys for all the privileged users in S would take 
up O(A- N - |S|) space. 


Decryption Times. The only two schemes for which the decryption times 
remained relatively constant as the total number of system users increased were 
the baseline scheme and [2, S3.2]. For the baseline scheme, decryption does not 
require any pairings. The decryption algorithm runs in constant time because 
only a single division needs to be computed (m = z; / (c**)), regardless of the 
value of N. In [2, S3.2], since we break up our broadcast encryption system 
into VN instances, we only have to use a single one of those instances — which 
we created during encryption — to decrypt the message. It is a tradeoff: slower 
encryption times to calculate each of the instances, but approximately constant 
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Table 7. Time evaluation for general public key broadcast encryption systems. The 
baseline scheme was implemented using NIST P-256 in OpenSSL, while the pairing- 
based schemes used curve BN254 in MCL. The KeyGen and Dec times represent the 
cost for a single user, while the Setup is the cost to initialize the entire system and 
Enc is the cost to encrypt to an arbitrary 10% of the system users. Let ms denote 
milliseconds, s denote seconds, and min denote minutes. 


Item Scheme Time when number of system users N = 
100 1K 10K 100K 1M 
Setup baseline 2 4.46 ms | 27.39 ms | 252.65 ms 2.40s 23.218 
[2, 3.1] 3.1 | 51.47 ms | 456.97 ms| 4.94s 40.068 7.06 min 
[2, S3.2] 3.1 | 14.09 ms | 15.75 ms | 58.81 ms | 158.08 ms 762.45 ms 
[15, S3.1] 3.2 | 35.00 ms | 69.40 ms | 321.98 ms} 2.80s 29.44 
[23] 3.3 32.39 ms | 34.45 ms | 297.71 ms| 3.05s 29.718 
KeyGen | baseline 2 — — = — — 
[2, $3.1] 3.1 | 0.19 ms | 0.18 ms | 0.09 ms | 0.11 ms | 0.12 ms 
[2, $3.2] 3.1 | 0.11 ms | 0.10 ms | 0.14ms | 0.12 ms | 0.16 ms 
[15, S3.1] 3.2 | 11.07 ms | 164.14 ms | 954.00 ms| 9.75s 1.77 min 
[23] 3.3 11.05 ms | 104.49 ms | 954.14 ms| 9.78s 1.60 min 
Enc baseline 2 | 0.39 ms | 2.85 ms | 25.52 ms | 248.08 ms; 2.81s 
[2, S3.1] 3.1 | 1.26 ms | 1.32 ms | 2.40 ms | 16.35 ms 664.76 ms 
[2, 63.2] 3.1 | 3.58 ms | 8.61 ms | 26.60 ms | 86.36 ms | 306.96 ms 
[15, 63.1] 3.2| 1.18 ms | 1.37 ms | 1.99 ms | 12.63 ms 153.56 ms 
[23] 3.3 2.03 ms | 2.38 ms | 3.44 ms | 14.55 ms | 131.80 ms 
Dec baseline 2 | 0.04 ms | 0.03 ms | 0.03 ms | 0.05 ms | 0.07 ms 
[2, 3.1] 3.1 | 260 ms | 1.92 ms | 2.78 ms | 15.36 ms 349.52 ms 
[2, $3.2] 3.1 | 2.13 ms | 1.49 ms | 2.12 ms | 2.46 ms | 1.63 ms 
[15, S3.1] 3.2| 2.82 ms | 2.82 ms | 2.93 ms | 16.56 ms 261.98 ms 
[23] 3.3 6.37 ms | 6.44 ms | 7.49 ms | 23.05 ms 157.66 ms 


decryption times. All of the pairing-based schemes except [23] required only two 
pairings to be computed during decryption. [23] required nine pairings. This 
large number of pairings explains why the decryption times are consistently the 
slowest for this system for N <= 100K. We again see that the decryption times 
for [2, $3.1] are similar to those of its semi-static counterpart, [15, $3.1]. And this 
makes sense. For both, decryption times are dominated by a step that requires 
|S| — 1 group multiplications over G1. 


Overall Runtime Comparison. If we consider key generation a step in the 
decryption process, then [15, S3.1] and [23] by far have the slowest runtimes. 
But recall that these are the only two systems that are secure against non-static 
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Table 8. Space evaluation for general public key broadcast encryption systems. In the 
above, we set |S|, the size of the set of users a ciphertext is encrypted to, to be an arbi- 
trary 10% of the total number of system users. The baseline scheme was implemented 
using NIST P-256 in OpenSSL, while the pairing-based schemes used curve BN254 in 
MCL. Let B denote bytes, KB denote kilobytes, and MB denote megabytes. 


Item Scheme Space when number of system users N = 
100 1K 10K 100K 1M 
pk baseline 2 | 3.23 KB | 32.03 KB | 320.03 KB) 3.20 MB | 32.00 MB 
[2, S3.1] 3.1 12.90 KB | 128.10 KB) 1.28 MB | 12.80 MB 128.00 MB 
[2, S3.2] 3.1 1.66 KB | 5.09 KB | 16.06 KB 50.66 KB | 160.06 KB 
[15, 63.1] 3.2) 3.68 KB | 32.48 KB | 320.48 KB 3.20 MB | 32.00 MB 
[23] 3.3 4.19 KB | 32.99 KB 320.99 KB 3.20 MB | 32.00 MB 
sk baseline 2 32.00 B | 32.00 B 32.00 B 32.00 B 32.00 B 
[2, 3.1] 3.1 | 32.00 B | 32.00 B 32.00 B 32.00 B 32.00 B 
[2, 3.2] 3.1 | 32.00 B | 32.00 B 32.00 B 32.00 B 32.00 B 
[15, 63.1] 3.2| 3.26 KB | 32.06 KB | 320.06 KB 3.20 MB | 32.00 MB 
[23] 3.3 3.49 KB | 32.29 KB 320.29 KB) 3.20 MB | 32.00 MB 
ct baseline 2 | 352.00 B | 3.23 KB | 32.03 KB | 320.03 KB| 3.20 MB 
[2, S3.1] 3.1 96.00 B | 96.00 B 96.00 B 96.00 B 96.00 B 
[2, $3.2] 3.1 384.00 B | 1.12 KB | 3.26 KB 10.21 KB | 32.06 KB 
[15, $3.1] 3.2, 96.00 B | 96.00 B 96.00 B 96.00 B 96.00 B 
[23] 3.3 861.00 B | 861.00 B 861.00 B | 861.00 B | 861.00 B 


adversaries. It is a tradeoff: in order to achieve the higher level of security, the 
decryption will be slower. 

The encryption times for [15, $3.1] and [23] were comparable, if not bet- 
ter, than the encryption times for the systems which were secure against static 
adversaries. The setup times were faster because the n private keys were not 
computed during the setup phase. The only other downside with both of these 
systems are the long key generation times and the large private key sizes. 

Looking at Table 8, we argue that if the primary goal of the broadcast system 
is to achieve short public and private key sizes and efficient decryption times, 
then we recommend using [2, S3.2]. This is the only scheme that achieves public 
key and ciphertext sizes of O(A- VN). The private key sizes are constant. Even 
though the setup times for this scheme are not more efficient than those for [15, 
S§3.1] and [23], they are still very fast in comparison to [2, $3.1]. Additionally, 
the public key and private keys in [15, 63.1] and [23] are all of size O(A - N). 
This is very large. But we recall that while [2, $3.1] and [2, $3.2] are secure only 
against static adversaries, [15, S3.1] is secure against semi-static adversaries and 
[23] is secure against adaptive adversaries. If we judge these schemes only by the 
efficiency of their decryption times and public/private key sizes, and we desire 
a stronger level of security, then we recommend [15]. In general, the decryption 
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times for [15, $3.1] are much faster because they only require two applications of 
the pairing algorithm, while [23] requires nine pairings in decryption. Neverthe- 
less, as the total number of system users N grows larger, the decryption times 
for [23] approach the times for [15, S3.1]. So if we have a small total number of 
users in our system, we recommend |15, S3.1]. But if the value of N is very large, 
then [23] will perform equally well during decryption. And because the adaptive 
security provided by [23] is stronger than the semi-static security provided by 
[15, S3.1], we especially recommend [23] when the total number of system users 
N>1M. 

If the primary goal of our broadcast system is to achieve efficient encryption 
times, then we recommend any pairing-based system except [15, $3.2]. Then, 
depending on the desired level of security, we would choose either the statically 
secure [2, S3.1], the semi-statically secure [15, $3.1], or the adaptively secure 
[23] broadcast system. 


Further Theoretical Analysis. For further theoretical analysis, we denote A 
and A2 as a single group multiplication operation over G, and Gg, respectively. 
Exponentiations are denoted by \;° and A23. We let e denote a single pairing 
operation. For the baseline scheme, we simply use Ag and Ào? to represent a 
single multiplication and exponentiation over the elliptic curve group G, respec- 
tively. Here, we assume the time taken for a single group multiplication is O(A) 
and the time for a single exponentiation is O(A). As an example, if we write 
O(X +X-Y), then we mean that the runtime for this algorithm is dominated by 
O(Y) group multiplications (over G,; or Gz) and X miscellaneous O(1) opera- 
tions that individually take much less time than single group multiplications or 
exponentiations. 

In Table9, there are a few operations that we did not count. We did not 
count multiplications or exponentiations over Gr because they did not signifi- 
cantly impact runtimes in any of the schemes. We also did not count any addi- 
tion/subtraction operations over Z, because they were only used to compute 
8 = sı +82 and r = rı + r2 in the [23] broadcast system. Additionally, run- 
times for the setup phase for [23] and [15, S3.1] were dominated by choosing 
N random generators € G1. Since we did not define a symbol for choosing a 
random generator as an “operation”, this is not shown in Table9. When we use 
big-O notation to describe the time that the setup phase took for these two 
schemes (see Table 10), we use 6; to denote the time required to choose a single 
random generator in G1. Other than for these two setup functions, all of the 
time complexities for the schemes can easily be derived from Table 9. The big-O 
notation is best to refer to if we want to know which operation(s) are dominating 
runtimes, but for total runtime details Table 7 is better. 
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Table 9. Operation counts, where N is the total number of users in the system. 


Scheme Operation Count 
Setup KeyGen Enc Dec 
baseline 2. | Ao? - N = Ao- (ISI +1) + A0- | 2- Ao? + ào 
[S| 
[2, S3.1] 3.1 | A18 - (2N — 1) | 18 Ar: [S| +18 + A23 | A1-(|S]—1)+2-e 
+A23- N +e 
[2, 83.2] 3.1 | X18- (2B +A- | 18 A13-AtA1-|S|4+A23 A1 [Sal +2 e 


1) +A23-Bt+e 

[15, 63.1] 3.2 | 2-e+A13 +A28 | 18 - N + Ao? Ar: [S] +A13 + Ag? Ar (lS|—1)+2e 
A13- (N +8 Ar: (S| +2 

[23] 3.3. | 7-Ay34+6-A234+] “ k +8)+ | Ar ce a 

2-e+2-Ai 2-2? +5-A1 6- 1? +6- A2 


Ar (|S|-1)+9-e 


Table 10. Theoretical runtimes, where N is the total number of users in the broadcast 
system. 


Scheme Theoretical Runtime 
Setup KeyGen Enc Dec 
baseline 2 | O(A3 - N) — O(A3 - |S] + A- |S) O(A) 
[2, $3.1] 3.1 | O(A- N) | OQ?) O(A- |S) O(A- |S) 
[2, $3.2] 3.1  O(A8- VN), O(3) | O(N +A3-VN+4+2-|S]) | OA- | Sal) 
[15, S3.1] 3.2} O(6;-N) | O(\3- N) O(A- ||) O(A- |S|) 
[23] 3.3 Oli- N) | O3- N) O(A- IS|) O(A- ISI) 


On Identity-Based and Tracing Broadcast Encryption Systems. In the 
full version [4], we compare the identity-based broadcast encryption systems 
from [15, 64.1], [15, 64.3.1], and [14, 63.1]. We also compare a wide-range of 
systems that support tracing, including a private-linear broadcast encryption 
(PLBE) system from [12], an augmented broadcast encryption (ABBE) system 
from [13], and a risky broadcast and trace multi-scheme from [24]. We defer these 
details to [4] for space reasons. 


4 Applications of Broadcast Encryption 


Online Video Streaming. The most commonly referenced use case for broadcast 
encryption is online video streaming services like Disney+, Netflix, and Hulu. 
This category can also include content streamed by individuals on platforms 
like Twitch and YouTube, online conferencing services like Zoom and Microsoft 
Teams and even many social media platforms like Facebook, Instagram and Tik- 
Tok. Users with permission are given access to a myriad of different videos, and 
ideally bandwidth usage and client side decryption processing requirements need 
to be minimized so that users can watch the videos in real time and can watch 
those videos on any device, regardless of processing capability. The user numbers 
for these services are vast. During the second quarter of 2020 [6], Netflix and 
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Hulu had at least 190 million and 30 million users respectively. For these media 
streaming cases, we would recommend using the classic Boneh-Gentry-Waters [2] 
scheme as it provides the best combination of short ciphertexts and fast decryp- 
tion times even for large user sets. For N = 1 million users, the [2, $3.1] variant 
provides the best ciphertext size at 96B per ciphertext while decryption takes 
350 ms and the [2, S3.2] variant provides the fastest decryption at 1.6 ms with a 
32 KB ciphertext. Either of these are reasonable choices, although if the cipher- 
text size isn’t a problem, we’d recommend [2, S3.2] due to its smaller public key 
size (of 160 KB, where as [2, S3.1] requires 128 MB for 1 million users). For very 
large user datasets (e.g., in the 190 million range), using [2, S3.2] instead of [2, 
S3.1] becomes even more important, as the former’s public keys scale with VN 
while the latter scale with N. Both [2] schemes were proven secure in the static 
security model; if one wants the stronger adaptive security, Waters [23] offers 
this and small 861B ciphertexts, although the public key sizes grow to 32 MB 
for N = 1 million. Both of the identity-based systems [14,15] require hours to 
decrypt a single ciphertext when N is 1 million, so they are not contenders. 
The performance hit from [12] (the best performing traitor tracing scheme) 
vs. [2] could be worth it for the chance to combat revenue reducers like piracy. 
For N = 1 million users, the decryption time of [12] doubles (over [2, S3.2]) 
to 3.2 ms while the ciphertext size grows by a factor of 19 to 605 KB - larger, 
but reasonable on fast networks. The public key size roughly triples to 477 KB. 
Zhandry’s risky traitor tracing scheme [24, S9.3] provides the best ciphertext 
size for tracing schemes at only 38 4B, but the decryption time explodes to 
an infeasible 19h (for N = 1 million). Zhandry’s nonrisky, post-user expansion 
compiler version of his scheme (see [4] for details) has decryption times that are 
comparable to those for [12] and [13] for N = 1 million, but the encryption time 
balloons to over 1 min and the ciphertext size jumps from roughly 600 KB to 
almost 4 MB (details in [4] when a = 2/3.) One potential benefit is that the 
public/private key sizes of Zhandry’s scheme are smaller, but that likely won’t 
offset the additional encryption and space overhead. 
Constraint Summary: Needs to scale to 1 million users or more, with small ciper- 
text size overhead and fast (client side) decryption times. Encryption times less 
of a concern, but traitor tracing may be needed. 


Recommendation: Use [2, S3.2] (for fastest decryption and scalable public key 
size). See Sect. 3.1. If traitor tracing is required, use [12]. 


Online Game Streaming. Online game streaming is another form of media 
streaming that is becoming increasingly prevalent. Users receive high quality 
(resolution and frames per second) game data and give the server data like their 
keystrokes and mouse clicks in game. This system allows users to play games that 
have a performance requirement beyond what their client side device is capable 
of. Currently the way that online game streaming is done is that a server runs 
the game program remotely for each individual user. However, if adapted a mul- 
tiplayer game could feasibly send the same stream out to every user maximizing 
both server-side and client side efficiency. 
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In comparison to video streaming, online game streaming has a more strin- 
gent data speed requirement, as any wasted time could result in a subpar player 
experience. The most popular service, Nvidia GeForce Now, has around a mil- 
lion users [20], but it is a growing industry and the ceiling for game streaming 
services could be having user numbers on par with video streaming services. For 
this case, we would recommend [2, S3.2] or [23]. Both schemes have ciphertexts 
under 1 KB even for 1 million users, but the primary cost for both is a jump 
in public key size of 128 MB and 32 MB respectively. The [23] offers stronger 
provable security and low encryption/decryption times, while [2, S3.2] offers 
decryption times that are two orders of magnitude faster but encryption time is 
roughly triple that of [23]. 

While the live traitor tracing functionality could be useful, the difference in 
performance could make a notable difference for users. Perhaps [12] could be 
used in situations where most of the game data is preloaded on the client side, 
and the live data is sent out live and unencrypted or from a faster performing 
scheme like [2]. This could be a hybrid combination, allowing usage of the traitor 
tracing functionality, and ensuring fast enough performance. We note that the 
baseline ElGamal scheme takes almost 3s to encrypt the payload for 1 million 
users, which likely rules this out for live gaming applications, highlighting the 
power of broadcast encryption for this setting. 


Constraint Summary: Needs to scale to 1 million users or more, with a combi- 
nation of ciphertext size overhead and (client side) encryption and decryption 
times that support live interactions. Need to balance benefits of tracing with 
impact on user experience. 


Recommendation: Use [23] (for strong overall balance of security, low size over- 
head and fast encryption/decryption times) or if more speed in one component 
is needed, use [2, $3.2] (for fastest decryption) or [2, $3.1] (for fastest transmis- 
sion). See Tables 7 and 8. The overhead required for traitor tracing may frustrate 
the live gaming experience, but if it is needed, a hybrid approach using [12] may 
work. See [4] for more details. 


Live Sports Betting. A novel use case for broadcast encryption arrives with 
the emergence of live sports betting. Due to the new developments in wireless 
data speeds with the emergence of 5G technologies, some major companies are 
developing capabilities for in-person spectators to make bets on their mobile 
phones throughout a game, utilizing continually updating betting lines given 
the events happening within the game. Broadcast encryption could be used to 
quickly send out information to users about how much current bets are worth to 
cash out and the current betting lines, all in realtime. Additionally, some of these 
services may include a live broadcast, which could be different from the public 
broadcast (i.e. a bettors specific broadcast). In this use case, the total speed is 
the most important factor (making the encryption time more relevant here), and 
the total number of users is within a pretty regular range (N = 30,000 to 70,000), 
which is much smaller than the user amounts in some other use cases. In this use 
case, the total speed is the most important factor (making the encryption time 
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more relevant here), and the total number of users is within a pretty regular 
range (N = 30,000 to 70,000), which is much smaller than the user amounts 
in some other use cases. Like with online game streaming, broadcast encryption 
offers real performance savings over individually encrypting with ElGamal; when 
N = 100, 000 the encryption plus decryption time of ElGamal is roughly 10 times 
that of [15] or [2, 63.1]. For the N < 100,000 range, the public keys of [2, S3.1] 
are 13 MB, while the public keys of [15] are a more tolerable 3MB. Systems [15] 
and [2, $3.1] tie for the shortest ciphertexts at 96 B. The fastest encryption plus 
decryption time is [15] for this user level (and this holds over a range of sizes of 
allowed decrypter sets S from 10% to 50% of N), although the difference (a few 
milliseonds) isn’t likely to be observable by a human. 


Constraint Summary: Looking for a sweet spot in the 10,000 to 100,000 user 
range, with a combination of ciphertext size overhead and (server side) encryp- 
tion and (client side) decryption times that support real-time interactions. 


Recommendation: Use [15] (for best ciphertext size, best sum of encryption and 
decryption time, and public key size tolerable for N < 100,000). See Sects. 3.2. 
The overhead required for traitor tracing may frustrate the live betting experi- 
ence, but if needed, a hybrid approach using [12] and [15] may work. 


Distributor Limited Applications. In the above applications, we assume that the 
distributor (e.g., Netflix, YouTube) has large computing resources at its disposal. 
However, we also anticipate use cases where distributor performance becomes a 
bottleneck (e.g., where a person is streaming video from their smartphone to 
a group). A distributor limited implementation could be relevant in both the 
private and public sector. Within the private sector, a company manager who 
wants to broadcast a specific message to his employees could do so using broad- 
cast encryption. With the presumed post-social-distancing increase in online 
work, consistent, secure communication between manager and employees could 
be increasingly important with a decline in face-to-face communication. 

Within the public and military sector, these same benefits apply. In the 
public sector, however, having differing levels of access and the ability to revoke 
access to messages and live communications is more important. For example, if 
a broadcast encryption system is used to send out orders to a group, and one 
of the recipient devices is captured then revocation is necessary. Additionally, 
traitor tracing functionality could be especially valuable. 

Thus, in the case of a direct peer-to-many-peer type of communication, the 
performance of the distributor system becomes relevant, thus making the times 
for the Setup, KeyGen and Encrypt function times more critical. In this situa- 
tion, a simple recommendation is harder to make. [2, $3.2] for example, performs 
the best in the case of online video streaming, but if one person was streaming 
video from their smartphone directly to many peers, the encryption performance 
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of [2, $3.2] is much worse than [2, $3.1] and [15, $3.1]. Due to that constraint, if 
there are limited resources for the distributor, using [2, S3.2] isn’t a good choice. 
If peers are less than 100 K, in a situation with a distributor bottleneck we’d rec- 
ommend either [2, S3.1] or [15, S3.1]. The latter has much better performance 
in terms of encryption times, but the prior is orders of magnitude faster during 
KeyGen. In a situation where many keys are regularly generated, [2, $3.1] would 
be preferable, but in cases where keys are generated less often [15, S3.1] will have 
the best performance, allowing the fastest encryption. 

The same consideration can be made for the traitor tracing schemes. When 
the amount of users is around 1,000, [13] slightly outperforms [12] in both setup 
times and encryption times, with roughly the same decryption times and notably 
worse KeyGen times. In a situation with distributor performance restraints, 
where many keys are regularly generated, [12] will perform better. In a situation 
where keys are generated less often and there are less than 1,000 users, [13] can 
perform better. However, both of these schemes are outperformed by the baseline 
(individual encryption to each peer) until about about the 10,000 user level. 


Constraint Summary: For N < 100,000 user range, looking to optimize the 
distributor functions without sacrificing much data transfer time or client per- 
formance. 


Recommendation: Use [2, S3.1] (if need to generate keys often) or [15, S3.1] 
(otherwise). See Tables7 and 8. If traitor tracing is required for under 10,000 
users, the baseline (individual encryption) will likely outperform any of the trac- 
ing broadcast schemes. If traitor tracing is required for over 10,000 users, use 
[12]. See [4] for more details. 


Acknowledgments. The authors are grateful to Mark Zhandry for helpful interac- 
tions regarding his work [24] and Brent Waters for helpful discussions regarding prior 
work in broadcast encryption. 
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Abstract. In this paper we present an optimized variant of Gentry, 
Halevi and Vaikuntanathan (GHV)’s Homomorphic Encryption (HE) 
scheme. Our scheme is appreciably more efficient than the original GHV 
scheme without losing its merits of the (multi-key) homomorphic prop- 
erty and matrix encryption property. In this research, we first measure 
the density for the trapdoor pairs that are created by using Alwen and 
Peikert’s trapdoor generation algorithm and Micciancio and Peikert’s 
trapdoor generation algorithm, respectively, and use the measurement 
result to precisely discuss the time and space complexity of the cor- 
responding GHV instantiations. We then propose a generic GHV-type 
construction with several optimizations that improve the time and space 
efficiency from the original GHV scheme. In particular, our scheme can 
achieve asymptotically optimal time complexity and avoid generating 
and storing the inverse of the used trapdoor. Finally, we present an 
instantiation that, by using a new set of (lower) bound parameters, has 
the smaller sizes of the key and ciphertext than the original GHV scheme. 


Keywords: Homomorphic encryption - LWE - Matrix operations 


1 Introduction 


Background and Related Work. Homomorphic Encryption (HE) allows run- 
ning operations on ciphertexts so that decryptions match the results from the 
corresponding operations on plaintexts. HE has many interesting applications in 
the real-world, e.g., the computational private information retrieval [6], and the 
indistinguishability obfuscation [5]. Since introduced by Rivest, Adleman and 
Dertouzos [24] in 1978, the HE research has a long history in the modern cryp- 
tography. Early HE systems focused on evaluating asymmetric encryption and 
supports only one operation over encrypted data, either addition or multiplica- 
tion. This type of HE is referred to as partially HE. Typical examples involve the 
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additively HE schemes: Goldwasser-Micali and Paillier, and the multiplicatively 
HE schemes: RSA and ElGamal. 

Breaking through the single operation homomorphism took a long time. 
The first step forward was given in 2005 by Boneh, Goh and Nissim (BGN 
for short) [6], who presented an additively HE scheme supporting one multi- 
plication. An HE scheme that can evaluate two types of operations but only 
for a subset of operations is referred to as a Somewhat Homomorphic Encryp- 
tion (SHE) scheme. The BGN scheme is the first SHE scheme. Breaking the 
security of this scheme is as hard as solving the subgroup-membership problem 
in composite-order groups that admit bilinear maps. Later Gentry, Halevi and 
Vaikuntanathan (GHV for short) in 2010 [10] proposed an additively HE scheme 
supporting one “direct” matrix multiplication. Notably, a “direct” matrix multi- 
plication here means an ordinary matrix multiplication that does not require any 
extra computation. Security of their scheme is based on the standard Learning 
With Errors (LWE) assumption (see Sect. 2.1). The GHV scheme can be regarded 
as an improvement of the BGN scheme and has several inherent advantages (see 
Sect. 2.3 on details of the GHV scheme). Specifically, one significant advantage 
is that there is a worst-case/average-case classical reduction from the standard 
LWE problem to the GHV security. Another important advantage is that the 
GHV scheme can encrypt messages from a large space (i.e., any matrix ring) 
and has no restriction for the output size. Moreover, the GHV scheme holds 
much of the flexibility of the LWE-based cryptosystem, e.g., it can be made 
identity-based and leakage-resilient. In a nutshell, the GHV cryptosystem is still 
an outstanding SHE scheme. 

The first theoretically feasible construction capable of supporting arbitrary 
computations over ciphertexts, which is referred to as Fully HE (FHE), was intro- 
duced by Gentry in 2009 [9]. Since then, many FHE schemes have been proposed 
(e.g., [4,7, 12,14, 25, 27]). Generally speaking, the development of FHE until now 
involves three generations. Typical examples of the first generation are Gentry’s 
initial scheme based on ideal lattices [9] and van Dijk et al.’s proposal employing 
integer arithmetic [25]. The second generation includes Brakerski and Vaikun- 
tanathan’s constructions [4,7] that use new techniques to control the growth 
of noise. The third generation of FHE originates from the scheme of Gentry, 
Sahai and Waters (GSW for short) [12], which exhibits a somewhat distinct 
noise growth pattern. Although there is a great progress for the theoretical and 
practical improvements of FHE, for many applications, especially the applica- 
tions requiring a single algebraic operation, this type of encryption is currently 
impractical because of the big key size, the large ciphertext expansion and the 
long evaluation time [8, 22]. 

Besides the GHV scheme, there are two asymmetric HE schemes that can 
encrypt matrices and support homomorphic matrix addition and multiplication. 
The first one was proposed in 2015 by Hiromasa, Abe, and Okamoto (HAO for 
short) [14], and their scheme is a matrix extension of the GSW-FHE scheme [12]. 
Security of the HAO scheme can be reduced from the standard LWE assump- 
tion, while an additional special circular security assumption is necessary. The 
homomorphic matrix multiplication does not correspond to the “direct” matrix 
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multiplication and needs to employ a randomized function. In 2018, Wang, Wang, 
Xue, and Huang (WWXH for short) [27] presented another FHE scheme for 
encrypting matrices. Security of their scheme is based on hardness of the stan- 
dard LWE problem, and the size of ciphertext matrices is smaller than that 
of the HAO scheme. However, for the WWXH scheme, the tensor product is 
largely employed to perform the homomorphic matrix multiplication, and the 
corresponding computational cost is @(m*) for m x m input matrices. Thus, 
the complexity of using this scheme for some homomorphic computations (e.g., 
homomorphic computations over nondeterministic finite automata and linear 
algebra) is very large and not lower than that of the HAO scheme for the 
same computations. Some details of these two matrix-FHE schemes are listed in 
Table 1. Notably, total computational costs of both schemes are O(m’). 


Motivation and Our Target: Building a More Efficient GHV-Type 
HE Scheme. Based on the above descriptions, asymmetric matrix-FHE 
schemes [14,27] currently do not match with very efficient cloud computing- 
related applications that only run a single (linear algebra) operation. A typical 
example is the private and verifiable delegation of linear algebra [17] that only 
allows a client to run O(m* ) computation for matrices of large size m x m, where 
cœ € [2,3] is close to 2. SHE schemes are much more efficient and suitable for 
many applications. In particular, the GHV scheme has a sequence of desirable 
properties. Allowing encryption of a square matrix from any matrix ring in one 
operation and supporting the “direct” homomorphic matrix multiplication can 
make this scheme match with applications requiring the linear algebra compu- 
tation over any ring, and be a powerful tool for the very efficient verifiable linear 
algebra computation. Although the construction of the GHV scheme is elegant, 
it seems that there are some optimizations left in its performance, and these 
optimizations can make it more versatile. This brings the main question that 
we want to answer in this work: Can we create a more efficient GHV-Type HE 
scheme? In more detail, this question involves the following three aspects: 


— The new GHV-type HE scheme has lower time complexity, and in particular 
it is suitable for applications only permitting efficient privacy protection and 
verification (e.g., the private and verifiable delegation of linear algebra). 

— The new GHV-type HE scheme has lower space complexity. To achieve this 
we need to first figure out whether some key employed by the original GHV 
scheme is not needed for the improved one. 

— The new GHV-type HE scheme has smaller key and ciphertext sizes than 
those of the original GHV scheme. 


Our Results. In this work, we propose an efficient GHV-type HE scheme 
together with optimized parameters. Security of our proposal is still based on 
the standard LWE assumption. Specifically, our contributions are four folds: 


First Result (Sect. 3): Density of Trapdoor Matrix Pairs. Trapdoor gen- 
eration algorithms (e.g., [1,3,19]) play a big role in advanced lattice-based cryp- 
tographic primitives. They generate a pair of matrices (At, T°), i.e., At is an 
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(almost) uniformly random matrix and T is the corresponding trapdoor that 
is in the form of a nonsingular square matrix with short integer vectors. Some 
significant parameters related to the matrix pair, i.e., the lattice dimension and 
the quality of the trapdoor, generally have been explored when the correspond- 
ing trapdoor construction was given. In an asymmetric encryption scheme, (At, 
TŻ) can be used as the public and secret keys. In particular, since the short 
basis Tt and its inversion (T‘)~! used in the encryption schemes may multi- 
ply by matrices over a matrix ring, a natural question is how to evaluate the 
corresponding computational cost. To answer this question, we first introduce 
the concept of the density of a (trapdoor) matrix for matrix multiplication and 
give its definition. Actually, the density of a (trapdoor) matrix is measured by 
the number of nonzero elements of a matrix needed for a single matrix mul- 
tiplication. Then, we take (Tt, (T’)~') respectively generated by Alwen and 
Peikert’s trapdoor sampling algorithm (APTrapSamp for short) [3] and by Mic- 
ciancio and Peikert’s trapdoor sampling algorithm (MPTrapSamp for short) [19] 
as targets and analyze their concrete density. Notably, the non-deterministically 
constructed components of these two trapdoor matrix pairs become the hard nut 
of the corresponding density analyses. Technically, we thus employ the matrix 
decomposition to simplify the complex components, which makes us simply focus 
on exploring components with the deterministic distribution. Using our concrete 
decompositions, for (Tt, (T*)~!) generated by APTrapSamp and MPTrapSamp, 
the analyses give accurate estimates on their density (see Lemma 4 to 7). 


Second Result (Sects. 3 and 4): More Accurate Efficiency Analyses. 
For the GHV-HE scheme, although the approximate result of its computational 
cost has been given in [10], the more accurate estimate on the computational 
cost is important, in particular for finding applications which the cryptosystem 
can be plugged directly into. Hence, we carefully analyze the encryption and 
decryption procedures of the GHV scheme using APTrapSamp and MPTrapSamp, 
and present accurate results on their computational cost and space cost (see 
Theorem 2 to 5). Technically, our analysis for the decryption procedure is based 
on the idea that multiplying matrices over a matrix ring with T¢ (resp. (T’)~*) 
is equivalent to multiplying matrices over a matrix ring with the decomposition 
form of Tt (resp. (T’)~1). This implies that results on the density of Tt and 
(T*)~! are used for the efficiency analyses of the decryption procedure. We 
also employ the Hoeffding’s inequality to estimate a (near-)lower bound of the 
computational cost of the decryption procedure. Of course, the same idea is 
used to give the (time and space) efficiency analyses on our optimized GHV- 
type scheme (see Theorem 10). 


Third Result (Sect. 4): Simpler Construction and Optimizations. 
Towards addressing the question of the above section in a systematic way, we 
first propose a generic GHV-type construction that removes the expensive matrix 
inversion computation for (T*)~! and the multiplication by (T‘)~+ on decryption. 
In our generic construction, a sparse matrix T that is easily built is employed to 
recover the plaintext message (see Sect. 4.2 on T). Notice that, T is constructed 
deterministically, which means that it actually can be regarded as a “public” key 
for decryption. Moreover, our generic construction has an additional benefit for 
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the multiplication by T* on decryption. That is, a plaintext message can be recov- 
ered by multiplying with part of T* instead of T’, which further reduces the com- 
putational cost and space cost of decryption. Then we present some simple opti- 
mizations on speeding up the matrix multiplication used in our generic construc- 
tion (see Algorithms 2 and 3). For our optimizations, only element-wise additions 
are employed to achieve the multiplication by part of Tt and T on decryption, 
and a random, short component of T* is used as the unique secret key (i.e., the 
component matrix R). This implies that our optimizations guarantee that any 
instantiation of our GHV-type scheme using APTrapSamp-like trapdoor genera- 
tion algorithms has the asymptotically optimal time complexity and storage size 
of the secret key. Surprisingly, we achieve these efficiency improvements without 
having a negative effect on the security of the GHV-type scheme. 


Fourth Result (Sect. 4): Tighter Parameters. To ensure that our GHV-type 
instantiation using APTrapSamp enjoys correctness and the same homomorphism 
as the original GHV instantiation using APTrapSamp holds, we show new bounds 
for the modulus q and the lattice dimension m (see Theorems 6 and 7). In 
particular, the parameter bounds that we establish are lower than those of the 
original GHV instantiation. Since q has a direct impact on the key and ciphertext 
sizes, this means that sizes of elements of the public key and ciphertext can be 
smaller than those of the original GHV instantiation. Specifically, we first give 
a parameter setting for the case that our GHV-type instantiation only supports 
polynomially many additions (see Theorem 6). Then we present a parameter 
setting for the case that our GHV-type instantiation can permit polynomial 
number of additions and one multiplication (see Theorem 7). 


Table 1. Comparisons of asymmetric LWE-based matrix-HE schemes with equal 
parameters n and m satisfying n < m. 


oo ani ae og Te 
gave Pe so j oo aan 
et | of 

GHV(APTrapSamp) [10] SHE v y O(nm?) O(m?) 
GHV(MPTrapSamp)* SHE v y O(nm?) O(m?) 
HAO [14] FHE YK O(nm?) O(m?) 

WWXH [27] FHE ¥ x O(nm?) O(m? 
oGHV(APTrapSamp) [this paper] SHE WA v O(nm?) O(nm?) 
oGHV(MPTrapSamp) [this paper] SHE v y O(nm?) O(nm?) 


x For GHV(MPTrapSamp), MPTrapSamp is used in the original GHV scheme. 
7 ® and ©: “Direct” matrix addition and multiplication between the input 
ciphertexts 


Comparisons and Applications. A comparison of our optimized GHV 
(oGHV for short) scheme with the GHV scheme and other asymmetric matrix- 
(F)HE schemes is shown in Table 1, where we assume that all the schemes make 
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use of the same security parameter n and plaintext matrix size m x m. Notice 
that n and m are not the same; indeed, typically we have m = O(nlg q), where 
q= poly(n). 

Clearly, based on the above comparisons, we believe that our oGHV scheme 
can be plugged in as a “black box” to replace the original GHV scheme and 
deliver significant efficiency benefits in such applications discussed by Gentry et 
al. [10], e.g., electronic election protocols, private information retrieval protocols 
and identity-based encryption. Of course, the oGHV scheme may be used as a 
drop-in replacement in some other typical applications such as two-party compu- 
tation protocols [16], graph encryption schemes supporting approximate shortest 
distance queries [18] and the protocol for private regular-expression searches on 
encrypted data [26]. Here we want to highlight that, compared with the GHV 
scheme, our oGHV scheme opens the door to more efficient real-world privacy- 
preserving applications. A such example is the private and verifiable delegation 
of linear algebra, which is always an important research subject in cryptography. 
Although Mohassel [17] has given the GHV scheme based delegation protocols 
for some linear algebra problems such as matrix multiplication and matrix inver- 
sion, as shown in Tablel, the GHV scheme actually should be excluded from 
consideration due to the “heavy” decryption performing roughly O(m) compu- 
tations. Since our oGHV scheme achieves the desirable improvements in terms 
of the efficiency, it can be a natural match for private delegations of some linear 
algebra problems and even specific computations related to linear algebra. 


2 Preliminaries 


Notations. Throughout this paper, we use capital letters (e.g., X, Y) for random 
variables and probability distributions, standard letters (e.g., x, y) for scalars, 
and calligraphic letters (e.g., X, V) for sets. We denote (column) vectors by 
standard bold letters (e.g., x, y) and matrices by capital bold letters (e.g., X, 
Y). For a matrix X over any ring, the ith column of X is denoted by x;, the ith 
element of a vector x is denoted by 2;, and the ith element of the jth column of 
X is denoted by 2;,;. Nx is a random variable (or probability distribution) on the 
number of nonzero elements of X. We use X+ to denote the transpose of X. The 
ith standard basis vector is denoted by e;. lg refers to the base 2 logarithm. We 


use |z] to denote the set {1,2,--- , z}. a & X is considered as sampling an element 
x from a finite set X uniformly at random, and z — X refers to sampling an 
element z according to a probability distribution X. For a finite set X, we denote 
the uniform distribution over ¥ by U(X). We denote the binomial distribution 
with parameters p € [0,1] and m € Nx by Bing, where Pr[Bin,1 # 0] = p 
and Pr[Bin,,1 = 0] = 1 — p. We denote the discrete Gaussian (error) distribution 
over Z4 by Yg(q) that may be generated by sampling y — a exp(—1()”) and 
outputting |q- y| (mod q), where 8 > 0 and q > 2. X ~ D denotes that a 


random variable X follows a probability distribution D. For two distribution 
ensembles X “ {Xn} and Y e {Yn} indexed by n € N}, X & Y refers to 


the statistical indistinguishability between X and Y. z (mod q) is considered as 
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mapping « into the interval | — 4, 4]. Let x = (2,%4,...,2/,) € {0,1}” be the 
binary representation of z. Then we call bwt(z) = ||x||1 = #{i € [nJ|2, 4 0} 
the (Hamming) weight of x. Let ta, tm and t, denote the running time of the 
(modulo) addition, (modulo) multiplication and discrete Gaussian sampling over 
the integers, respectively. 

We also use the following simplified notations in this paper. Throughout, we 
denote the security parameter by n € N4}, and most parameters are functions 
of n, e.g., mı, mo, m, q = poly(n), 8 = O and c = c(n) > 0, where poly(n) 
denotes some polynomial function in n. Thus, we often omit n for the simplified 
notations. Moreover, overwhelming probability means that the probability is 
1 — y, where ~ is negligible in n. 


2.1 Cryptographic Problem 


We present below a famous hard learning problem, i.e., the Learning with Errors 
(LWE) problem, which has proven to be a rich and versatile source of many 
(post-quantum) cryptographic primitives. 


Definition 1 (LWE [10,23]). Let n, m, q be positive integers, s € Zy be a secret 
vector, and x be a probability distribution over Z,. We denote the LWE dis- 
tribution by Ls x, that is the probability distribution over Z7*" x Zi? given by 


choosing A 2 Zi", sampling a vector x — x™ and outputting (A, (A,s)+x) = 
(A, b) E€ ZZ” x Z. 

The decision LWE problem dLWE(n, m, q, x) is the problem of distinguish- 
ing whether a sample (A, b) is drawn from Ls y, or uniformly at random from 
Zy” x Zg. The search LWE problem sLWE(n, m, q, X) is the problem of finding 
the secret s from a sample (A, (A,s) +x) drawn according to Ls y,q- 


In particular, x is generally the discrete Gaussian distribution Wg(q) [15]. For 
the LWE version defined with Yg(q), it is known as the “standard form”. About 
the hardness of the standard LWE problem, there have been several results, 
e.g., [21,23]. Specifically, Regev [23] first proved that solving sLWE(n, m, q, 8) 
efficiently is as hard as finding a quantum solution for approximating certain 
worst-case lattice problems, i.e., the decision version of the Shortest Vector 
Problem (GAPSVP) and the Shortest Independent Vectors Problem (SIVP). 
Regev [23] also showed that dLWE(n, m, q, 8) can be equivalent to (worst-case) 
sLWE(n, m, q, 3) for a prime modulus q € [2, poly(n)], with a loss of up to a 
poly(n) - q factor in m. Then, Peikert [21] gave that solving sLWE(n, m, q, 3) 
efficiently is (at least) as hard as approximating GAPSVP (and a GAPSVP 
variant) in the worst case via a classical (PPT) reduction with similar parame- 
ters. Moreover, based on the above Regev’s search-to-decision reduction, Peikert 
[21] provided a classical foundation for the hardness of dLWE(n, m, q, 8). Notice 
that s can be sampled from the error distribution (ie., Wg(q)”) without any 
loss in security [2]. In what follows, since (post-quantum) cryptographic applica- 
tions are typically based on dLWE(n, m, q, 3), we summarize Regev and Peikert’s 
results for the decision variant. 
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Lemma 1 (Theorem 1.1 in [23], Theorem 3.3 in [21]). Let n be a positive integer, 
B > 0 and q € Nx be a product of co-prime numbers, i.e., q = |] q; where 
Vi € Ny q; = poly(n). For Gq > 2./n, if there is an efficient algorithm solving 
dLWE(n, m, q, 8), there is an efficient quantum algorithm running in time poly(n) 
to approximate GAPSVP and SIVP on n-dimensional lattices in the worst case to 
within O(3) factors, and an efficient classical algorithm running in time poly(n) 
to approximate a ¢-to-¢’ GAPSVP variant GAPSVP¢ ¢ on n-dimensional lattices 
in the worst case to within ¢ = O(q/n) and ¢' = O(3) factors. 


2.2 Trapdoor Sampling Algorithms 


Here we recall two significant trapdoor generation algorithms for cryptographic 
lattices, which are inspired by Ajtai’s initial work [1]. The first proposal is the 
Alwen and Peikert trapdoor generator [3], denoted by APTrapSamp. This ran- 
domized algorithm outputs a hard random lattice At € Zy together with 
some short orthogonal basis (i.e., trapdoor) T* € Z™*™ of the lattice Az (AŻ), 
where m = O(nlg q). The block structures of A* and Tt are shown in Fig. l(a), 
where Aj 2 Zy" and mı + m = m. In particular, APTrapSamp involves 
two concrete algorithms. Compared with Alwen and Peikert’s first algorithm, 
the second algorithm, denoted by APSTrapSamp, can be regarded as an opti- 
mized algorithm with respect to the lattice dimension and the quality of the 
trapdoor. Then, APSTrapSamp is more suitable for efficient cryptographic appli- 
cations. The second type of trapdoor generator is introduced by Micciancio 
and Peikert [19], which is the current state of the art in the trapdoor gen- 
eration. This randomized algorithm, denoted by MPTrapSamp, can output a 
hard random lattice At € Zy" together with a sufficiently “short” integer 
matrix R € Z™*™ as the gadget-based trapdoor (with tag (e.g., I) over Zj*"), 
where m = O(nlg q) and mı + mz = m. MPTrapSamp includes the statistical 
instantiation, denoted by MPSTrapSamp, and the computational instantiation. 
In particular, the statistically secure trapdoor construction from MPSTrapSamp 
is the better choice of cryptographic applications. Moreover, MPSTrapSamp may 
generate a good basis Tt for A+(A*) from knowledge of R, which implies 
that MPSTrapSamp can also serve as a “traditional” trapdoor sampling algo- 
rithm. The corresponding block structures of At and Tt are given in Fig. 1(b), 


where Ay ral Z7” mı, Notice that, since the block structure of Tt generated by 
MPSTrapSamp is similar to that of the trapdoor from APSTrapSamp, we refer 
to the “traditional” MPSTrapSamp as the APTrapSamp-type trapdoor sampling 
algorithm. In what follows, we state some consequences related to APSTrapSamp 
and MPSTrapSamp. For more details, please refer to [3,19]. In the full version 
of this paper [28, Appendix A.1], we also present details of component matri- 
ces G € Z™xm™m(G eZ) P € Z™xm U € Z™2xm and R € Le 
generated by APSTrapSamp and MPSTrapSamp, respectively. 


Lemma 2 (Theorem 3.2 in [3], Lemma 5.3 in [19]). There are PPT randomized 
algorithms APTrapSamp and MPSTrapSamp that, on input 1”, q > 2 and m = 
O(nlg q), can generate matrices A’ € Z3% ™ and T* € Z™*™ such that 
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— At is statistically close to uniform over a, 

— T° is a “small” invertible matrix. In particular, the Euclidean norm of all 
columns of T* from APSTrapSamp is bounded by O(nlg q), where the constant 
hidden in the O(-) is at most 20. 

- TA =0 (mod q). 


(G+R)U 1 |p IRP RU m 
sted imal | PE PPE 
mi m U P m2 Sao oO P U mo 
(a) APSTrapSamp (b) MPSTrapSamp 


Fig. 1. Block structures of A’ and T*. 


2.3 The Gentry-Halevi-Vaikuntanathan Encryption Scheme 


The GHV scheme [10] is a public-key encryption scheme for encrypting matrices 
over any matrix ring Z7’*"", where p > 2. This scheme employs the idea of 
the trapdoor function given by Gentry, Peikert and Vaikuntanathan in 2008 
[11], where a near-uniformly random matrix A € Z7'*" is the “public key” 
and an invertible “small” matrix T € Z™*™ such that TA = 0 (mod q) is 
the used trapdoor, to get the public and secret key pair for the encryption 
and decryption, and specifically runs the APTrapSamp-type sampling algorithm 
(e.g., APSTrapSamp and MPSTrapSamp) to output such a key pair (A, T). The 
basic construction of the GHV scheme, denoted by GHV, is due to the fact 
that the trapdoor T can solve the standard LWE instance relative to A, which 
implies that security of GHV relies on the hardness of the standard LWE problem 
dLWE(n, m, q, 8) (see Lemma 1). For lack of space, please refer to [10] or the full 
version of this paper [28, Appendix A.2] on more details of the GHV scheme. 


2.4 Other Preliminaries 


Definition 2 (Density of a Matrix for Matrix Multiplication). Let X 
and Y be matrices over any rings. In a single matrix multiplication XY over a 
matriz ring, density of X (resp. Y) is equal to the number of necessary nonzero 
elements of X (resp. Y) over the ring. These nonzero elements are used in XY. 


Lemma 3 (Fact 1 in [10]). Let positive integers n,q > 2, 8 > 0 and g = 
w(/lgn). For x — We(q)” and an arbitrary vector y € Z”, | (x,y) | < Baqglly || 
with probability 1—w, where ||y|| is the Euclidean norm of y, and w is negligible 
in n. 


Theorem 1 (Hoeffding Bound [13]). Let X1, X2,..., Xk, where k E€ N4, be a 
sequence of independent random variables such that Vi € [1, K] Pr[X; € [a;, bil] = 


272 


1. Let X= Ņ 4 Xi. Then, for any T > 0, Pr[|X—-E[X]| > 7] < 2exp F50? , 
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3 Efficiency Analyses of GHV 


In this section, we precisely discuss (time and space) efficiency of the original 
scheme GHV and show why GHV using APSTrapSamp or even MPSTrapSamp is 
“relatively” inefficient and should be ruled out for some cryptographic applica- 
tions that only run in time O(m"), where c’ € [2,3]. In particular, the density 
of the “special” trapdoor matrix pair (T, T~') has a direct influence on effi- 
ciency of GHV, which means that it should be first explored. For APSTrapSamp 
and MPSTrapSamp, although some significant parameters related to the output 
lattice associated with A’ (and A) and the resulting basis T* (and T), e.g., the 
lattice dimension and the basis quality, have been explored in [3,19], to the best 
of our knowledge, our work give the first measure of density of (T, T+) for 
matrix multiplication. 


3.1 On the Density of Trapdoor Matrix Pair (T, T~*) 


We first give the density analysis of the matrix T generated by APSTrapSamp and 
MPSTrapSamp (i.e., Nr), respectively. Then, we focus on exploring density of the 
corresponding inverse matrix T7} over Z, for p > 2 (i.e., Np-1). Interestingly, we 
obtain Nr and Np-1 based on simple and special decomposition forms of T and 
T-t. Notice that, for APSTrapSamp, when the modulus qis a prime, H can be of 
the form [ qe: -- ge, A], where H = [į] E€ goa Rimi =n) is the column reduction 


form of the kernel of A,. Since Aj Š Zy” in APSTrapSamp, we present a 
mild assumption on H as follows: if q is a prime, Vi € [n] and Yj € [n + 1, mı] 
hij  Zq, which means H È Z7 0™7™., 

Lemma 4. For the trapdoor matrix T € Z™*™ generated by APSTrapSamp, 
it has the decomposition form T = (|SP J] + [Ẹ] [u P])’. Then, under the 


assumption that Vi € |n] and Vj € [|n + 1, mı] hij £ Za, where q is a prime, we 
have Ny = Nr + Npa + M+ m + nwt bwt(q—1)), where Nr ~ Bin; and 


,dm2 
Npa N BiN: n(m -njw Where Py is the binary representation of {hilt € [n], j € 
[n+ 1, mı]}. 


Proof. A proof is given in the full version of this paper [28, Appendix A.3]. 


Lemma 5. Let the modulus q be a large enough prime. Consider that R is 
sampled from the distribution over {0,+41}"™%*™ that outputs 0 with probability 
1 


5 and +1 each with probability a For the trapdoor matrix T € Z™*™ generated 


by MPSTrapSamp, based on its decomposition form T = ([1 R] [} 8])', we have 
Nr = Nr + Np + 2m+ n(w-— 2 + bwt(q)), where NR ~ Bim and Np Š 


Z Mı M2 
Binı , 
gnam w 


1 We believe that a matrix sampled from the distribution over {0,+1}™*”™ is 
generally sparser than a matrix from the discrete Gaussian distribution for some 
B’ > nv(Z). 
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Proof. A proof is given in the full version of this paper [28, Appendix A.4]. 


Lemma 6. For the inverse matric T7! € Zp” corresponding to T gener- 


minge ` 2 t 
ated by APSTrapSamp, it is of the form ({° oe ee PH ~ ; 


-H7! H-1(G+R) 
and can be sti as CY ae al [5] [a -H-(G+R)])) where 
U`! = diag(V, , Vol I) and in particular Vi € [wy] the ith column of 


the wk X wk ee Va: an) is Da 2*-Je;, where k € [mı]. Then, under 


the assumption that Vi € |n] and Vj € [n + 1, mı] hij È Za, where q is a prime, 
we have that Ny-1 is (at least) 2m + mı + nbwt(q — 1) + Np, + Yı + Y2 with 


sS . 
Yı & Binp-1 
P 


,n(m—=n) and Yo ~N Bini (anes: 


Proof. A proof is given in the full version of this paper [28, Appendix A.5]. 


Lemma 7. Let the modulus q be a large enough prime. Consider that R is 
sampled from the distribution over {0,+41}"™%*™ that outputs 0 with probabil- 
ity 4 and +1 each with probability i. For the inverse matrix T7! € Zm™xm 
corresponding to T generated by MPSTrapSamp, it has the ar ees form 
T= ee Pal es] F = ale where UT! = diag(V,,', Va" ,1). Then, 


0U 0 I 
we have Np-1 = Nr + Np +3m+ Y3, where NR ~ Bins Np & Č Bins niau 


n(w? W— 
and n(w— 1) < Y; < na Bu 2) 


‚Mı M2? 


Proof. A proof is given in the full version of this paper [28, Appendix A.6]. 


3.2 Theoretical Efficiency of GHV 


Now we analyze the computational cost and space cost of GHV when encrypting 
matrices over Z7’*™. In particular, the cases of employing APSTrapSamp and 
MPSTrapSamp are discussed, respectively. Using results on the density of trap- 
door matrix pair (T, T7!) in Lemma 4 to 7, we can show accurate estimates of 
these two costs. Notice that, we present (near-)lower bounds on these two costs 
of the decryption procedure of GHV. 


Theorem 2. For a plaintext matrix B € Z'*™ (p > 2) that is encrypted by 
GHV using APSTrapSamp, Enc(B) takes at most m?((n + 1)tm + (n + 2)ta 4 
t,) time to generate a ciphertert matric C, and Dec(C) needs to take at least 
2m(2=n n(m- n) + (d— 3 — 3,/8) mz + (m — n)nw + 4m)(tm + ta) time (with 
overwhelming probability) to recover B from C. 


Proof. A proof is given in the full version of this paper [28, Appendix A.7]. 


Notice that, letting mı = Hinlg q and m = Toa Mg q, which means 
m = 3nlgq < |8nlg q], from the consequence on Dec(C) in Theorem 2, we 


see that the computational cost of the decryption procedure of GHV employing 
APSTrapSamp is at least 2m?(tm + ta) (~ O(m?)). 


56 L. Zhao et al. 


Theorem 3. For a plaintext matrix B € Zi*™ (p > 2) that is encrypted by 
GHV using MPSTrapSamp, Enc(B) takes at most m?((n + 1)tm + (n + 2)ta 4 
tg) time to generate a ciphertert matric C, and Dec(C) needs to take at least 
2m( m4 (nwt m2) — \/2n(m (nw+ m2) + 1)+5m+ (2w—3)n)(tm+ ta) time (with 
overwhelming probability) to recover B from C. 


Proof. A proof is given in the full version of this paper [28, Appendix A.8]. 


Let us consider mı ~ nlg q and m2 = nlg/q|, which are used in the “tradi- 
tional” MPSTrapSamp construction. According to the consequence on Dec(C) in 
Theorem 3, we see that the computational cost of the decryption procedure of 
GHV employing MPSTrapSamp is about m3(tm + ta) (~ O(m?)). 


Theorem 4. For a plaintext matriz B € Z*“™ (p > 2) that is encrypted by 
GHV using APSTrapSamp, Enc(B) takes 2nm[lg q| + m?[lg p] bits to generate 
a ciphertext matrix C, and Dec(C) needs to take at least 2m?[lg q| + n(m — 
n) [lg p] + 2dmz + n(m, — n)w bits to recover B from C. 


Proof. A proof is given in the full version of this paper [28, Appendix A.9]. 


Theorem 5. For a plaintext matrix B € Z?*™ (p > 2) that is encrypted by 
GHV using MPSTrapSamp, Enc(B) takes 2nm[lg q| + m?[lg p] bits to generate a 
ciphertext matrix C, and Dec(C) needs to take at least 2m? [lg q| + mı (2m2 + nw) 
bits to recover B from C. 


Proof. A proof is given in the full version of this paper [28, Appendix A.10]. 


4 Our Optimized GHV-Type Encryption Scheme 


The above efficiency analysis confirms that GHV is not suitable for applications 
(e.g., the private and verifiable delegation of computation) that must use data 
protection techniques with roughly Olm“) computational complexity, where 
c € [2,3] is close to 2. Hence, in this section, we modify the original scheme and 
are ready to present our optimized variant, denoted by oGHV, for keeping inher- 
ent merits of the scheme and making the corresponding running process more 
efficient, e.g., achieving O(nm?) computational overhead. In particular, to make 
comparisons with the GHV instantiation that employs APSTrapSamp (see [10]), 
APSTrapSamp is still used in our oGHV instantiation. Of course, MPTrapSamp 


is also a candidate for oGHV. Notice that the trapdoor T’ := [T{ T$], where 
Le [etu] and T$ := | R21], as adopted throughout the whole section. 


4.1 Using a Sparse Matrix to Replace T~+ 


From Theorem 2, we know that GHV takes roughly O(nm?) running time to 
encrypt an mx m matrix and uses O(m?) time to recover this matrix. In partic- 
ular, the computational cost of the decryption procedure is evidently larger than 
that of the encryption procedure. This means that we can focus on optimizing 
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the decryption algorithm and reducing the corresponding cost to make the whole 
cryptosystem more efficient. Notice that there exist two steps in the decryption 
algorithm, i.e., C’ = TCT‘ (mod q) and B = T~!C’(T")~! (mod p). Specifi- 
cally, based on the fact that C is of the form AS + pX + B (mod q), comput- 
ing TCT’ (mod q) is an indispensable step, which is used to cancel out AS. 
T-!C’(T’)-! (mod p) can be seen as a “supplement” of TCT‘ (mod q). The 
main purpose of this step is to cancel out (T,T*) and recover B. Although the 
second step is similar to an additional operation, the corresponding computa- 
tion is expensive in the decryption procedure and has great influence on the 
computational cost of the whole cryptosystem. 

Thus, let us consider how to reduce the running time of the step T~!C’(T*’)~ 
(mod p) and improve efficiency of the whole decryption algorithm including 
the first step. Ideally, we would like to find a sufficiently sparse matrix to 
replace T7! and “indirectly” recover B from C’ by employing some other sim- 
ple computation. Unfortunately, this is not a computationally feasible opera- 
tion. However, from the definition of T in Sect. 2.2 (see Fig. 1(a)), we notice 
that the mz x mo cakes component matrix U is the main part of T 
and satisfies No < m? — Ny. According to Teme 6, we also know that 
U`! = diag(V,,',--- , V3., D) satisfies Ny-1 < m3 — My-1. These observations 
inspire us that we can construct an extremely sparse matrix involving UT}, 
denoted by T, to decrypt B from C. Specifically, the original plaintext matrix 
B should be first enlarged to [} 8] by padding zero elements in the encryption 
algorithm. Notice that, the number of the padded zero elements is far less than 
that of elements of B. Then, in the decryption algorithm, C’ = TCT’ (mod q) 
is executed and, after that, T= a P| is used to recover B by running TCT 
(mod p). As described above, our optimization for the construction of GHV is 
very simple but can surprisingly achieve the desired efficiency improvement. In 
Sects. 4.3 and 4.5, we give the detailed correctness analysis for the optimized 
scheme oGHV and also present the efficiency exploration of oGHV, which sup- 
ports our optimization. Moreover, here we highlight another merit of using T 
instead of T-t. That is, it is unnecessary to store T for multiple encryptions. 
From Lemma 6, we have that Vi € [mı] V;,; can be seen as a deterministically- 
constructed matrix, which means that T is also a deterministically-constructed 
matrix that is easily reconstructed for multiple encryptions, while some compo- 
nents of T7! must be stored for each GHV encryption. In Sect. 4.4, we introduce 
a concrete algorithm (i.e., Algorithm 3) to show how to efficiently run the mul- 
tiplication between T n T*) and C’ without using T (resp. T’). 


4.2 Generic Construction of oGHV 


Now we give details on the generic construction of the optimized GHV-type HE 
scheme oGHV with parameters n, m1, m2, M, q, 3 for plaintext matrices over Zp 
with any integer p > 2, where q is an odd prime, and @ is a Gussian error 
parameter. In particular, oGHV including a triple of PPT algorithms (oKeyGen, 
oEnc, oDec) is described below. 
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— oKeyGen(1") > (A, (T, T)) : Run APTrapSamp-type trapdoor sampling algo- 
rithm to get a matrix A € Z7’*" and its trapdoor matrix T € Z™*™ such 


that TA = 0 (mod q). Generate a matrix T = [u] e Z™xm2, where 


U`! € Z™2xm is defined in Lemma 6. Output (A, (T, T)) as the public and 
secret key pair. 
- oEnca(B) — C : Given a plaintext matrix B € 232% ™?, build an m x m 


matrix B’ = [§ 8] using three small zero matrices of respective size m X mı, 


mı X mg and Mm X mı, where mı + mz = m. Choose S £ Zy” and X — 
W a(q)™*™?, and generate a ciphertext matrix C € ZP*™ as C = AS+pX + 
B’ (mod q). 

~ oDeccp @ (C) > B : Given the ciphertext matrix C, run C’ = TCT’ (mod q) 
= T(pX + B’)T* (mod q) and output B = T'C’T (mod p). 


In the above construction, if APSTrapSamp is employed by oKeyGen, from 
Sect. 2.2 (cf. the full version of this paper [28, Appendix A.1]), we know that 
mı can be equal to (1+ 6)nlgq and mz > (4+ 26)nlg q, this implies that mı 
and mə can satisfy mz >> mı. Then, most of elements of B’ come from B, and 
we have, in some sense, “oEnca(B) ~ Enca(B)”, where Enc is the encryption 
algorithm of GHV. About concrete instantiations of the parameters m1, m2, M, q 
and 8, which are used to guarantee that oGHV holds correctness, security and 
homomorphism, please refer to Sect. 4.3. In particular, according to properties of 
the proposed generic construction, the prime q related to the key and ciphertext 
sizes can be set to be smaller than that used for GHV. Moreover, some detailed 
optimizations based on the generic construction are presented in Sect. 4.4, which 
further reduce the computational cost and memory cost of oGHV and guarantee 
that the smallest key pair is employed. Notice that, similar to that in GHV, 
the post-multiplication by T* and T on decryption in oGHV is unnecessary. 
This means that oDec simply runs T*(TC (mod q)) (mod p) for obtaining B. 
The post-multiplication can be employed to decrypt product ciphertexts (see 
Sect. 4.3). 


4.3 Homomorphic Operations and Concrete Parameters 


Our optimized scheme oGHV enjoys the same homomorphic properties as GHV 

holds. Specifically, oGHV also supports addition and multiplication homomor- 

phism. In particular, for two ciphertext matrices Cı = AS;+ pX,+Bj{ (mod q) 

and Cy = AS + pX +B, (mod q) corresponding to two plaintext matrices Bı 

and Bg, considering the sum ciphertext C = Cı + C2 (mod q), we have 
C=C,+Co (mod q) = A (Sı + S2) +p (X1 + X2) + B| + B} (mod q). 

= ——— =y 

? Clearly, the state-of-the-art discrete Gaussian sampling algorithms over the integers 
(e.g., [20]) can be considered as candidates used in oGHV to replace the sampling 
method proposed by Gentry et al. [10]. What is important is that the corresponding 
parameter setting needs to ensure that oGHV still holds the desired correctness, 
security and homomorphism. 
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It is easy to see that Cı + C2 (mod q) can be decrypted to Bı + B2 (mod p) 
if values of all the elements of T(pX + B’)T* are smaller than 4, where B’ = 
[0 B, ?B, |- Moreover, considering the product ciphertext C = CC% (mod q), 
we have 


C=CiC§ (mod q) 
= A (S1C$) +p (Xı (pX2 + B2) + BY X45) + Bi (B2) + (pX1 + Bi) Sj A’ (mod q). 
= Ne ee OO SS 
x B’ Š 
This naturally implies that C,C4 (mod q) can be decrypted to BıB$ (mod p) 


when values of all the elements of T(pX + B')T* are smaller than 4, where 


B' = OBB F as discussed above. In what follows, we present our answer on 
the parameter setting (for q, mı, m2, m and 8), which guarantees the feasibility 
of the homomorphic operations. 

Notably, according to the above analysis on the additive homomorphism of 
oGHV, we know that, similar to the case on decryption of the normal ciphertext, 
the post-multiplication by T* is not required for decrypting a sum ciphertext 
C= S (AS; + pX; + B) (mod q), where c > 0. Moreover, compared with 
GHV, of which the correctness of decryption must rely on a condition that each 
element of T(pX + B)T* is bounded by 4, we want to show that the correctness 
of decryption of oGHV is able to depend on a more relaxed condition, resulting in 
the smaller parameters q and m that we can set. Specifically, consider that C’ = 


E | where block matrices C1 = TıCT{ (mod q), CL = TıCTÉ (mod q), 


C} = T2CT{ (mod q) and C} = T2CT/ (mod q) are respective sizes mz X ma, 
mg X mı, mı X m and m, X m, we have T'C’T (mod p) = (U~')'C{U7! 
(mod p). This means that the final result can be recovered if the absolute value 
of each element in Tı (pX + B')TÉ (instead of T(pX + B’)T‘) is bounded by 
4. Then, from the relaxed condition, we first set the parameters that simply 
ensure that oGHV is able to support n° additions. After that, we establish the 
concrete parameters that guarantee that oGHV also holds the one-multiplication 
homomorphism. 


Theorem 6. Consider that APSTrapSamp is employed by oGHV. For the fixed 
parameters n and c > 0, let q, mı, m2, M, B be set as q > 40n°+!plg n, m = 


mı + m > pole q| + + olg q, where my = ang q| and mz > mi nig q, 
and 3 = van Then, oGHV with parameters n, mı, m2, M, q, B supports n° 


homomorphic addition operations over the matrix ring Zg” ™ (and Zg2*™?). 
Proof. A proof is given in the full version of this paper [28, Appendix A.11]. 


Theorem 7. Consider that APSTrapSamp is employed by oGHV. For the fixed 
parameters n and c > 0, det q, mı, M2, m, B be set as q > 2'3n3+3¢p? lg? n, m= 


mi +m > [100018 q] + nlg q, where mı = H nlg q| and m > i nlg q, and 
B= Then, oGHV with parameters n, mı, m2, m, q, B supports n° 


2n 2 F on F agen 


homomorphic addition operations and one homomorphic multiplication operation 
over the matriz ring Z*™ (and Zg2*™?). 
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Proof. A proof is given in the full version of this paper [28, Appendix A.12). 


4.4 Computational Optimizations 


Generally speaking, the matrix multiplication is a costly operation for cryp- 
tographic primitives related to the matrix. Here, according to concrete con- 
structions of Tt and T from APSTrapSamp and the generic construction of the 
cryptosystem presented in Sects. 2.2 and 4.2, some practical optimizations on 
speeding up the matrix multiplication used in oGHV and further improving effi- 


ciency of oGHV are given®. 


Algorithm 1: Ternary-Integer Matrix Product 
Input: X € {0,+1}™8*™4, Ye Lens m3, M4, and ms 
Output: Z = XY € Zame 
Z = {O}™s KM5: 
for i € [ms] do 
for j € [m4] do 

for k € [ms] do 

if xij == 0 then 


1 

2 

3 

4 

5 

6 | Zi,k+= 0; 

7 else if x;,; == 1 then 
8 | Zk = Yki 

9 else if x;,; == —1 then 
10 | Zik— = Yj ki 

11 end 

12 end 

13 end 

14 end 

15 return Z; 


Accelerating the Multiplication by a Ternary Matrix. Our idea is that, 
if a ternary matrix is involved in the matrix multiplication, the corresponding 
element multiplications are eliminated by running selections and additions. More 
concretely, a product can be obtained based on Algorithm 1. 


Decomposing the Multiplication by Tt and T. According to Algorithm 1, 
we can get a method to replace the multiplications in TCT* by selections and 
additions. In particular, our technique is based on the decomposition form of 
Tt (resp. T) in Lemma 4. Specifically, for [SP w] (resp. [Su ral! there is 
(at most) a 1 in each column of GU, and others are zero elements. Then, (at 
most) a 1 or —1 is in each column of [Sp Fi which means that the product of 


(Sro erep E] ) and some matrix can be achieved by simply employing 


selections shown in Algorithm 1. For [Ẹ] (resp. [#]’), since values of all elements 
are from {0,+1}, Algorithm 1 can be directly used to obtain the product of 


R] (resp. [R] ) and some matrix. For [u P] (resp. [u P]"), values of elements 
I I 


3 The proposed optimizations are not only focus on T* and T from APSTrapSamp. 
Actually, some extremely similar optimizations can be developed for any 
APTrapSamp-type trapdoor sampling algorithm (e.g., MPTrapSamp). 
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are from {—2,0,1}. Then, a modified Algorithm 1, where multiplying by —2 is 
replaced by two additions, is suitable for generating the product of | U P] (resp. 
[U P]’) and some matrix. In Algorithm 2, how to run the multiplication by T and 


Tt is shown. Notice that, as discussed in Sect. 4.3, for C’ = E cal = TCT’, 
3 4 

only Ci = T,CT* is the required matrix corresponding to the final result. 

According to the decomposition form in Lemma 4, this means that Algorithm 

2 simply needs to involve TÍ = [SF] + [R] U, where R can be regarded as the 


only “secret” matrix. 


Algorithm 2: Multiplication by T’ and T 


Input: C € ape, R € {0,+£1}™*™, n, w, mi, m2, and m 
Output: Ci € Z72*™2 


i C=C; 

2 for i€ [m] do /* Running the multiplication by Ti */ 
3 for j € [m2] do 

4 Cig = Ĉi (j+m1)) 

5 for k € [mı] do /* Invoking Algorithm 1 */ 
6 Cag = Cig + Tk jČiki 

7 end 

8 end 

9 for j € [0, n — 1] do 

10 for k € [w— 1] do 

11 Ci, (wjtht1) = Ci,(wjtkt1) — (Ci,(ujtey + ntk); 

12 end 

13 Či, (wj+1) = Ci +1) + Ĉi, (wj+1); 

14 en 

15 end 

16 for i € [m2] do /* Running the multiplication by Tı */ 
17 for j € [n] do 

18 Cji = Ĉji; 

19 en 

20 for j € [m2] do 

21 ji = EG+m,),i3 

22 for k € [mı] do /* Invoking Algorithm 1 */ 
23 | ĉj = Ĉji + Tk jÈk,i; 

24 end 

25 end 

26 for j € [0, n — 1] do 

27 for k € [w— 1] do 

28 |  Čeujtk+i),i = Èfwjtktt),i — (Èlwjtk),i + Grupa), a) 

29 end 

30 ÈČ(wj+1),i = EG+1),¢ H lwjt) ii 

31 end 

32 end 


33 ci = the top-left m2 X mə block of C; 
34 return Ci; 


Simplifying the Multiplication by T and 'T*. For T = |Y; ], where U™! = 
diag(V;,',--- , V31, D), we have Vi € [w] v7* = Xj- 2” ej, which implies that 
Va = eyyi + 2v7', where 7’ € [w — 1]. Based on this fact, for the case of 
multiplying some matrix (e.g., C’) with T (resp. T+), elements of the (wj+i+1)th 
column (resp. row) of the corresponding product can be generated from elements 
of the (wj + i)th column (resp. row) of the product by running 2m additions, 


where i € [w—1] and j € [0, n—1]. Consider that C’ = e | . Since the concrete 
3 4 


multiplications can focus on C}, the “whole” product of (U~!)'C,U7! = TCT 
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is computed as shown in Algorithm 3. Notice that Algorithm 3 does not need 
any additional memory except the memory for storing C4. 


Algorithm 3: Multiplication by T and T’ 
Input: C{ € Z%2%™2, n, w, and mg 
Output: B = (U~')'C/U~! € Zp2%™2 


1 for i€ [w-— 1] do 
2 for j € [0, n — 1] do 
3 for k € [m2] do 
+ A + + 
a Ck, (wj+i+1) = Ck, (wj+i+1) T Ck, (wji) T Ck, (wj+i)) 
5 Clwjtit1),k = CCwp tiga) sk + CCuppay.k + Cuppa) ri 
6 end 
7 end 
8 end 
9 


return B = C}; 


4.5 Property Analysis 


In this section, we present the analyses on correctness, security and efficiency of 
the optimized encryption scheme oGHV, respectively. 


Theorem 8. For a plaintert matriz B € Z°", oGHV with parameters n, 
mı, M2, M, q, B that we can establish has correct encryption and decryption. 


Proof. A proof is given in the full version of this paper [28, Appendix A.13]. 


Theorem 9. If there is a distinguishing algorithm with advantage € against the 
IND-CPA security of oGHV with parameters n, mı, m2, m, q, and B, then there 
must be a distinguisher against dLWE(n, m, q, 3) with roughly the same running 
time and advantage (at most) 5—. 


Proof. The security proof follows directly from the proof of IND-CPA security 
for GHV (see Theorem 2 in [10]). 


Theorem 10. For a plaintett matric B € Zy?*™ that is encrypted by the 
optimized scheme oGHV using APSTrapSamp, oEnc(B) takes (at most) m?(n+ 
1)(ta + tm) + Mty + Mta time and 2nmf|lg q| + mžflg p] bits to generate a 
ciphertext matrix C, and oDec(C) needs to take (at most) (($d+ yZ + 2)(m+ 
mz) +4n(w—1))meta time and (m? + m3) [lg q] + 2dmg bits to recover B. 


Proof. A proof can be found in the full version of this paper [28, Appendix A.14]. 


5 Conclusions 


In this paper, we have proposed an optimized GHV-type asymmetric HE scheme, 
which is more efficient than the original GHV scheme. In particular, it provides 
a much faster decryption algorithm, and the computational complexity of the 
decryption is decreased from O(m?) to O(nm?). As the same as the GHV scheme, 
security of our new GHV-type scheme is based on the standard LWE problem, 
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and our scheme also supports matrix encryption. We have compared the perfor- 
mance of our scheme with two LWE-based FHE schemes, which support matrix 
operations, and the comparison result indicates that our scheme is more efficient. 
We also have discussed the options of using APSTrapSamp or MPSTrapSamp in 
the GHV scheme, and our optimizations can benefit both of these two options. 

Although we have given the optimized GHV-type HE scheme, from the per- 
spective of implementation, how to make this proposal more practical is an 
interesting open problem. 
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Abstract. We study the provable security claims of two NIST 
Lightweight Cryptography (LwC) finalists, GIFT-COFB and Photon- 
Beetle, and present several attacks whose complexities contradict their 
claimed bounds in their final round specification documents. For GIFT- 
COFB, we show an attack using ge encryption queries and no decryption 
query to break privacy (IND-CPA). The success probability is O(qe /2"/”) 
for n-bit block while the claimed bound contains O(q2/2”). This posi- 
tively solves an open question posed in [Khairallah, ePrint 2021/648 (also 
accepted at FSE 2022)]. For Photon-Beetle, we show an attack using qe 
encryption queries (using a small number of input blocks) followed by 
a single decryption query and no primitive query to break authenticity 
(INT-CTXT). The success probability is O(q2/2°) for a b-bit block per- 
mutation, and it is significantly larger than what the claimed bound tells, 
which is independent of the number of encryption queries. We also show 
a simple tag guessing attack that violates the INT-CTXT bound when 
the rate r = 32. Then, we analyze other (improved/modified) bounds of 
Photon-Beetle shown in the subsequent papers [Chakraborty et al., ToSC 
2020(2) and Chakraborty et al., ePrint 2019/1475]. As a side result of 
our security analysis of Photon-Beetle, we point out that a simple and 
efficient forgery attack is possible in the related-key setting. 

We emphasize that our results do not contradict the claimed “bit 
security” in the LwC specification documents for any of the schemes that 
we studied. That is, we do not negate the claims that GIFT-COFB is 
(n/2—log n)-bit secure for n = 128, and Photon-Beetle is (b/2— log b/2)- 
bit secure for b = 256 and r = 128, where r is a rate. We also note that 
the security against related-key attacks is not included in the security 
requirements of NIST LwC, and is not claimed by the designers. 


Keywords: Authenticated Encryption - Lightweight Cryptography - 
Provable Security - NIST 


1 Introduction 


NIST Lightweight cryptography! aims at standardizing authenticated encryp- 
tion (AE) schemes for resource-constrained devices. In March 2021, NIST has 


1 https: //csre.nist.gov /projects/lightweight-cryptography. 
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announced ten finalists among the 32second-round candidates. The finalists 
include GIFT-COFB [3] and Photon-Beetle [6]. GIFT-COFB is a block cipher- 
based AE that combines a variant of COFB mode [13] and the lightweight 128-bit 
block cipher GIFT [5]. Photon-Beetle is a permutation-based AE that combines 
Beetle mode [11] and the lightweight cryptographic permutation Photon [19], 
which is an ISO standard [1]. This paper studies the provable security bounds 
of GIFT-COFB and Photon-Beetle, and shows some attacks whose success prob- 
abilities are inconsistent with the presented security bounds in the final round 
specification documents of NIST LwC. 


GIFT-COFB. For the original COFB and GIFT-COFB, the security bounds for 
the combined AE notion of IND-CPA and INT-CTXT were presented in [3,13]. 
Assuming a nonce-respecting attacker and that the underlying block cipher is a 
random permutation, GIFT-COFB’s AE bound is roughly 02/2” + nqq/2”/? for 
O = Oe + Oa + qe + qa, where Ce (resp. Ca) denotes the total queried blocks in 
encryption (resp. decryption) queries, and qe (resp. qa) denotes the number of 
encryption (resp. decryption) queries. This bound suggests that if (1) oe reaches 
29/2, or (2) oq reaches 2”/2, or (3) qa reaches 2”/?/n, the bound reaches 1 
and hence no security guarantee is possible. The tightness of these conditions 
has been studied by Khairallah [21-23] and Inoue and Minematsu (IM21) [20]. 
Khairallah [21-23] showed attacks with qq = 2"/? with about ce = 2"/? or 
Te = 2"/4, called Weak Key attack and Mask collision attack [21,22]. Khairallah 
finally showed one with qe = 1, ce = O(1) (a few blocks) and qq = 2"/?, called 
Mask Presuming attack [23]. The last one implies that the tightness condition 
(3) has only the small gap of logn factor. Inoue and Minematsu [20] studied 
the tightness of (1) and showed an attack with oe = 2"/? and qa = 1. As 
in the previous attacks, this attack breaks the authenticity and matches the 
aforementioned bound. For (2) it remains unsolved, and [20] mentioned that it 
might be an artifact in the proofs. 

We take a closer look at the condition (1). IM21’s attack with qe encryption 
queries and 1 decryption query has success probability roughly q?/2". However, 
we found an improved attack that needs qe encryption queries to break privacy 
(hence the combined AE notion) success probability roughly gq. /2"/?. The exis- 
tence of such an attack has been posed as an open problem by Khairallah [23]. 
We solved this positively. This implies a contradiction with the bound in the 
NIST LwC document although the bit-level security maintains. We give a brief 
analysis on the root of this contradiction in Sect. 3.2. 


Photon-Beetle. For Photon-Beetle, the security proofs for the original version 
and the NIST LwC version have been shown in [6,11,12]. For b-bit block per- 
mutation with b = 256 and rate (which is the length of one message block 
processed in one permutation call) r = 128, the security bounds roughly tell 
b/2 — logb/2 = 121-bit security for both IND-CPA and INT-CTXT. Dobrau- 
nig and Mennink commented on a constant factor related to a key recovery 
attack [18], and Mége analysed the security of the hash function [27]. 
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We focus the authenticity bound shown in the final round NIST LwC sub- 
mission document [6], which is roughly qp(q + q')/2°+rqp/2?/? +4, /20/ 0-0 + 
ro'/2°56-", where qp, q, q! and o’ denote the number of primitive queries, the 
number of encryption queries, the number of decryption queries, and the total 
number of blocks in decryption queries. The rate can be either r = 128 or 32, 
where r = 128 is the primary setting. The tag length is 128 bits for both cases. 
When r = 128, we observed that if qp = 0, i.e. we do not query the primitive 
(permutation), the above authenticity bound reduces to the bound that has no 
contribution from encryption queries. We invalidate this by presenting a simple 
forgery using 2°/? encryption queries and a single decryption query. The suc- 
cess probability is close to 1, while the claimed bound indicates a negligibly 
small probability with that complexity. This attack shows inconsistency with 
the claimed bound and implies the lack of the birthday term with respect to 
the block size, O(q2/2°), in the claimed bound. Moreover, when r = 32, the 
INT-CTXT bound reduces to the bound that is smaller than q’/2!?8, which is 
impossible to achieve for any AE of 128-bit tags. Thus, a simple tag guessing 
attack (i.e., decryption queries with identical nonce, AD, ciphertext, and distinct 
tags) invalidates the claimed bound. This implies even the break of bit-level secu- 
rity suggested by the bound. However, the bit security shown in [6, Table 4.1] 
claims 128-bit authenticity. We clarify that we do not break the figure. Moreover, 
we study other (improved or modified) security bounds for Photon-Beetle shown 
in the subsequent papers [15,16]. In [16], an improved bound AE bound is pre- 
sented. The bound claims that the IND-CPA security is maintained beyond 2°/? 
encryption queries, but this is not possible to achieve. The same paper presents a 
simplified AE bound, and we point out that this cannot be true. We then clarify 
that the ePrint version [15] of [16] addresses the issue, while we still see an issue 
in simplification. 

As a side result of our security analysis of Photon-Beetle for r = 128, we 
point out that a simple and efficient forgery attack is possible in the related-key 
setting, in which the attacker can modify the key used in the oracle [7,9,26]. 
In Photon-Beetle, a fixed constant is xor’ed into the secret key when the input 
(both AD and a message) is empty, and our forgery makes use of this fact. 
See [4,17,24] for examples of related-key attacks on some AE schemes. In the 
domain of public-key authenticated encryption, see [25]. 

Our attacks do not depend on the primitives and do not break the primitives. 
The attack against GIFT-COFB does not work against the COFB versions in [13, 
14] because of the shorter nonce length than the NIST LwC version. Our attacks 
show some inconsistencies in the claimed security bounds of GIFT-COFB and 
Photon-Beetle. At the same time, we would like to emphasize that these results 
do not negate the claimed bit security levels of GIFT-COFB and Photon-Beetle. 
We also note that the security against related-key attacks is not included in the 
security requirements of NIST LwC, and is not claimed by the designers. 
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2 Preliminaries 


2.1 Notations 


Our notations largely follow the specifications of GIFT-COFB and Photon-Beetle 
[3,6]. Let [i] = {1,...,¢} and fi] := {0,1,...,2}. Let {0,1}* denote the set of all 
bit strings. The set of bit strings whose length is a multiple of n is denoted as 
({0, 1}")*. For X € {0,1}*, |X| denotes its bit length. An empty string € is a bit 
string of length zero; we have |e| = 0. The block length of X € {0,1}* in n-bit 
blocks is denoted as |X|, := [|X|/n]. A concatenation of two bit strings X and 
Y is written as X || Y or simply XY. Let Trunc;(X) denote the first t € [|X|] 
bits of X, where Trunco(X) = e. For two integers a and b, we write alb if a 
divides b. For a bit string X, X « c denotes the left-shift of X by c bits. Bit 
rotation of X by c bits to the left (right) is denoted by X « c (X >> c). 

For X € {0,1}*, the parsing operation of X into n-bit blocks is denoted by 
(X[1],...,X[z]) & X. Here, if X # e, X[1] || X[2] || ... || X[z] = X and |X[é]| = 
n for i < |X|, and |X[z]| € [n] for x = |X|n. By writing Xı |X @ X 
we mean the parsing such that X; || X2 = X and |X1| = a; and |X| = ag. If 
X =€, x = 1 and |X[z]| = 0 (i.e., the parsing yields the same empty string). The 
sequence of i zeros is denoted by 0t. We may use an integer i € {0,1,...,2"—1} 
to mean an element of {0,1}", assuming the standard encoding, e.g., for n = 4, 
3 denotes 0011. 


Galois Field of 2” Elements. An element a in the Galois extension field 
GF(2”) will be interchangeably denoted as an n-bit string a,_1...@,@9 or an 
integer 57") a;2'. Hence, by writing 2-a or 2a when no confusion is possible, 
we mean the multiplication of a by 2 = x. This operation is called doubling and 
has been frequently used by various modes for the “domain separation” task. 
See [28] for example. For n = 64 (that will be used for GIFT-COFB), we use the 
primitive polynomial x®* + xf +x? + x + 1 to define the field GF(2”). In this 
case, the doubling 2-a is (a < 1) if msbı (a) = 0 and (a < 1) ® (0°911011) if 
msb; (a) = 1, and the tripling 3- a means 2 - a ® a. Combined expressions such 
as 2'- 3 . a are defined analogously, namely į doublings and j triplings of a. 


2.2 Cryptographic Components 


A keyed function with key space K, domain 4, and range ) is a function F : 
Kx X — Y. We may write Fx (X) for F(K, X). If Mode is a mode of operation 
for F using a single key K € K for F, we write Mode[F’x] instead of Mode[F] x. A 
block cipher is a keyed function E : K x T x M — M such that for each K € K, 
E(K,-) is a permutation over M. A cryptographic permutation P : M => M 
is simply a (keyless) permutation. GIFT-COFB is based on a block cipher, while 
Photon-Beetle is based on a cryptographic permutation. 

Let A be an adversary that queries c oracles, O1,...,Oe in an arbitrarily 
order and outputs a certain final output. By writing AC1°2", we mean the 
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final output of A. Let Perm(n) be the set of all permutations over {0,1}". For 
block cipher E : K x M — M, the PRP advantage is defined as 


Adv"? (A) := Pr [x AK: APK > 1| — Pr [7 Š Perm(n) : A™™) > 1). 


The PRP advantage represents the indistinguishability of Ex from the uni- 
form random permutation of the same message space for adversaries performing 
queries to encryption oracles (either Ex (*) or m(*)). 


2.3 Authenticated Encryption 


We briefly describe the syntax and security notions about authenticated encryp- 
tion (AE). Our targets are both nonce-based AEs [8,29], which requires nonce 
to be unique for each encryption. Let IZ denote a nonce-based AE scheme con- 
sisting of an encryption function J7.€x% and a decryption function M.D g, for key 
K & K. For plaintext M with nonce N and associated data (AD) A, I.€x 
takes (N,A,M) and returns ciphertext C (typically |C| = |M|) and tag T. 
Here, AD is a part of the input that is not encrypted but must be authenticated 
(e.g., a protocol header). The tuple (N, A, C,T) will be sent to the receiver. For 
decryption, 1.Dx takes (N, A, C,T) and returns a decrypted plaintext M if the 
authentication check is successful, and otherwise an error symbol, L. 


Security Notions. The security of AEs can be defined by two notions. The 
privacy” notion is the indistinguishability of encryption oracle M.E from the 
random-bit oracle $ which returns random |M|+7 bits for any query (N, A, M). 
The adversary is assumed to be nonce-respecting, i.e., nonces can be arbitrarily 
chosen but must be distinct for encryption queries. The privacy advantage is 
defined as 


Adv (A) = Pr |K È K i ATEC) = 1] — Pr [A8 > a], 


which measures the hardness of breaking the privacy notion for A. This notion 
corresponds to IND-CPA [8]. 

The authenticity notion is the probability of successful forgery via queries to 
IT.Ex and IT.Dx oracles. We define the authenticity advantage as 


Advi ™(A) := Pr [K E K: ATEKO) Prle) forges | , 


where A forges if it receives a value M’ # | from MH.Dg. Here, to prevent trivial 
wins, if (C,T) — I.E€x(N, A, M) is obtained earlier, A cannot query (N, A, C, T) 
to I7.Dx. The adversary must be nonce-respecting for encryption queries, but 
has no restriction on decryption queries. It corresponds to INT-CTXT notion [8]. 


? Following the literature (e.g., [28]), we conventionally refer to it as privacy, but in 
practice, it may be more intuitive to call it confidentiality. 
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It is also common to use a combined notion, sometimes called AE advantage, 
define as 


Advif(A) = Pr |K È K: AMER PEC m 1] — Pr [ase = 1), 


where L oracle denotes the oracle that always returns the rejection symbol. It 
is know that the sum of Privacy and Authenticity advantages is a bound of AE 
advantage [30], thus it compactly represents the security of an AE scheme as a 
whole. 


3 Analysis of GIFT-COFB 


Specification. For reference, the specification of GIFT-COFB is shown in 
Appendix A (Figs. 4 and 5). The padding function pad : {0,1}* — ({0,1}")* is 
a variant of so-called one-zero padding and defined as pad(X) = X if X # £ and 
|X| mod n = 0, and otherwise pad(X) = X || 10("-IXI mod n)-1), The G in Fig. 4 


denotes a matrix such that G- X := (X[2],X[1] « 1) for X[1], X[2] 22 x 
X € {0,1}". We also write G(X) to mean G- X. 

We show our attack against GIFT-COFB that contradicts the claimed security 
bound. As mentioned earlier, this does not invalidate the claimed bit security 
levels, namely 64-bit IND-CPA security and 58-bit INT-CTXT security in the 
specification document. 


d 


3.1 Our Attack 


The security bound shown in the latest NIST LwC specification document is as 
follows (with minor changes in notations): 


Theorem 1 (Chapter 4 in [3]). 


(3) 1 qa(n + 4) 
gn 2n/2 gn/2+1 
302 + qa + 2(de + Ge + oa) ` Fa 
T Qn 4 


where q! = de+qatGe+oa, which corresponds to the total number of block cipher 
calls through the game, and t! = t+ O(q'). Note that the advantage has been 
taken by the maximum advantage over all the adversaries making qe encryption 
queries, qq decryption queries and running in time t, such Ce, oa are the total 
number of blocks queried in the encryption and decryption queries, respectively. 


The term Adv¢ip+(q’, t’) denotes the maximum of PRP advantage for any adver- 
sary of q’ queries and tł’ time complexity. When we only use encryption queries, 
the above bound effectively reduces to about 02/2” and hence about q?/2” if 
each message is short. We present an attack using qe encryption queries (where 
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Fig. 1. The first encryption query of the attack against GIFT-COFB. 


each message is short) with success probability about qe /2r/ 2. This contra- 
dicts the bound of Theorem 1, since q2/2" < qe/2”/2 necessarily holds when 
1 < qe < 27/2. The attack proceeds as follows. 


1. The attacker makes a query (N, A, M) to the encryption oracle such that 
|A| = n, |M| = 2n and M = M{[1] || M[2] (for arbitrarily chosen N, single- 
block A and two-block M), and it obtains corresponding (C,T), where C = 
C{1] || C[2], as shown in Fig. 1. 

2. The attacker computes Y[1], Y[2], and 1sb,,/2(X[2]) = 1sb,,/2(G(Y[1]) © 
M{1]). Note that msb,,/2(X[2]) is unknown; nevertheless, the attacker can 
mount a privacy attack by using the guessed X[2] as the nonce of the next 
encryption query. 

3. For 0 <i < 2”/2—1, the attacker queries (N;,.A;, M;), where |.A;| = |M;| = n, 
to the encryption oracle such that 


N; = (i)n/2 | 1sbp/2(X [2]), L= Trunc, /2(¥ [2]), 
A; = N; ® G(Y[2]) © 3L; ||0"?, 
M; = N; © G(Y [2]) @ 37L, || 0"/?, 


where (i),/2 denotes n/2-bit string of a binary representation of i. The 
attacker obtains corresponding (C;, T;). In the real world, there always exists i 
such that M; 9 C; = Y [2] and T; = Trunc,(Y[2]), where i fulfilling N; = X[2]. 
In the ideal world, Pr[M; 6 Ci = Y(2],T; = Trunc,(Y[2])] = 1/2"*7 holds 
for all 7, and thus the attacker can find i such that M; @ C; = Y[2] and 
T; = Trunc,(Y [2]) holds with a negligibly small probability, 1/2”/?+7. 


In the real world, the above attack fails when N = X[2] accidentally holds 
because it prevents the attacker from using X [2] for the next nonce. To prevent 
such a case, the attacker can query a longer plaintext in Step 1, and it can find 
X[] s-t. lsby/2(X[-]) A 1sbnj2(N) with a sufficiently high probability. 

We remark that this attack does not work against versions of COFB in 
TCHES 2017 [13] and Journal of Cryptology [14] because the nonce length of 
these versions is n/2 bits. 


3.2 Brief Analysis on Security Proof 


As we mentioned in the previous section, the security bound shown in [3, 
Chapter 4] does not include the term O(q./2"/?) nor O(o,/2"/?). However, 
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in [3, Sect.4.2], the authors provide INT-CTXT bound, which includes the 
term 30,/2°4 assuming n = 128. This term is somehow missing in the final 
bound of the AE advantage that combines privacy and authenticity. Still, in 
any case, since our attack uses only encryption queries, the terms O(q./2"/?) or 
O(a-/2"/) should appear in the IND-CPA security bound, originally presented 
in [3, Sect. 4.1]. Let us look into [2] which shows the full proof of GIFT-COFB. 
The authors define the following two events as the bad events. 


B1: Kir [ja] = Xis [j2] for some (i1, j1) A (i2, j2) where Ji, J2 >0. 
B2: Y; [j1] = Yiz [j2] for some (i1, j1) # (i2, j2) where jı, j2 > 0. 


Here, X;[j] and Y;[j] denote input and output of the j-th underlying block cipher 
call in the i-th encryption query. Also, X;[0] = Ni, where N; is the nonce value 
in the i-th encryption query. As our attack shows, the attacker can produce a 
collision between X;[0] and X,[2] with probability q./2"/?. One can speculate 
that this inconsistency could be fixed by setting jı, jg > 0 in the above events 
(then it covers the presented attack), rather than jı, j2 > 0. 


4 Analysis of Photon-Beetle 


Specification. For reference, we present the AEAD specification of 
Photon-Beetle almost verbatim in Appendix A (Figs.6 and 7). In the specifi- 
cation, ozs,(X) for any X such that |X| < r, is another variant of one-zero 
padding, defined as ozs,(X) = X || 10"~!*!-!. The expression E?a : b evaluates 
to a if E holds and b otherwise. Similarly, (Ey and E2?a : b : c : d) evaluates 
to a if Ey A^ Eg holds, b if Ey ^ Ez holds, c if Ey ^ E2, and d otherwise. The 


Shuffle in the p and p~* functions is a function: {0,1}" > {0,1}. It is defined 


as Shuffle(S) = (S[2] || S[1] >> 1), where (S{1], SIJ) <2 s. 


We show our attacks against Photon-Beetle that violate its claimed security 
bound in NIST LwC documentation [6]. We emphasize that our attacks do not 
violate the claimed “bit security” levels of Photon-Beetle, which are 121-bit IND- 
CPA and INT-CTXT security when r = 128, and 128-bit IND-CPA and INT- 
CTXT security when r = 32. 


4.1 Claimed Security Bound and Our Attack 


In [6], Photon-Beetle is claimed to be provably secure, with the security bound 
of 


of + 2 4 9'%, Mh, o 
9256 ' 9256—r ' 9256 ' 9128 ' 9128(r—1) 


for privacy (IND-CPA), where ø is the total number of blocks in encryption 
queries, qp is the number of offline queries, r is the rate (r = 32 or 128), q is the 
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number of encryption queries, and ge is the total number of blocks in encryption 
queries [6, Sect. 4.1]°. For authenticity (INT-CTXT), the claimed bound is 


/ r / 
o (2 _% G  , ro ie ) 


9256 " 9128 ' 9128(r—1) ' 9256—r 


where qp is the number of offline queries, q is the number of encryption queries, 
q’ is the number of decryption queries, r is the rate (r = 32 or 128), and ø’ is 
the total number of blocks in decryption queries [6, Sect. 4.2]. 

We present two attacks that invalidate the bound in (1). The observation is 
that, when qp = 0, i.e., when the attacker does not make offline queries, then 
the bound (1) is simplified into 

ro’ 


We observe that the bound (2) claims that the authenticity security is maintained 
even if the attacker makes an unlimited number of encryption queries and that 
the success probability is smaller than o’/2!°8 when r = 32. In what follows, we 
present attacks based on these observations. 


Birthday Forgery Against Photon-Beetle. The attack is as follows. 


1. Let q = 2°/?, and fix q distinct nonces Nj,...,N,, q distinct AD Aj,..., Ag 
with |A;| = b, and q distinct messages Mj,...,M, with |M;| = b +r. The 
attacker chooses Mj,...,M, of the form M; = M’|| M{, where |M’| = b, 
|Mj| = r, and Mj,...,Mj take q distinct values. That is, the first b bits 
of Mı,..., Mq take the same value M’, and the corresponding portions of 
ciphertexts are used to detect a full-state collision. 

2. Make q encryption queries (N1, A1, Mı),..., (Nq, 4g, Mg) and obtain 
(C1, Tı), cery (Ca, Ta), where IC; | =b +r. 

3. Find (i, j) such that C; = C}, where C; is the first b bits of C;, and the same 
for Cj. 

4. Output (N;, Ai, Cj, Tj) (or (N;,A;,Ci,T;)) as the forgery. 


See Fig. 2 for the process of (N;, A;, Mi) and (N;,A;,M;) when r = 128. With 
a high probability, we have a full-state collision, i.e., we have (i,j) such that 
Si = Sj; in the figure. The collision can be detected from C; and C4, which are 
the first b bits of C; and C}. If this happens, we see that the forgery in Step 4 
succeeds. 

The bound (2) claims that the success probability of the attack is negligibly 
small and at most O(7r/2?°°-") when r = 128 (or at most O(6r/2?°°-") depend- 
ing on the interpretation of g’), while the attack succeeds with an overwhelming 
probability. Therefore, the bound (1) is invalidated. 


3 We do not know the difference between o and ce. 
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Fig. 2. Two encryption queries (N;, Ai, Mi) and (Nj, Aj, Mj) when r = 128. Here, 
Ai = A: [1] || Ai [2] and M; = M’ [1] || 14’[2] || M47. 


Tag Guessing Attack Against Photon-Beetle with r = 32. When r = 32, 
the above setting of qp = 0 makes the INT-CTXT bound (1) reduces to 
320! /2?56-32 = g’ /2?19 which is smaller than o’/2!*8. When ø” is close to q’, this 
implies a bound that is not possible to achieve with 128-bit tags. A simple tag 
guessing attack invalidates this bound, that is, q’ decryption queries using identi- 
cal (nonce, AD, ciphertext) tuple with distinct tags will succeed with probability 
about a /2°?*. 


Discussion and Implication. In [6, Sect. 4.2], the designers outline the proof 
of the bound (1). To quote: 


Also, if an adversary can obtain a state collision among the input/output 
of a permutation query with the state of an encryption query or decryption 
query, it can use the fact to mount an forgery attack. 


The argument here ignores a full-state collision among encryption queries, 
resulted in the first attack. Here is another quote from the same document: 


The trivial solution for forging is to guess the key or the tag which can be 
bounded by Thh. 


We do not find an issue here, while for r = 32, the bound (1) makes a stronger 
security claim than this argument. 

We note that the above two attacks need 21?8 complexity, and thus do not 
violate the claimed 121-bit security (when r = 128) or 128-bit security (when 
r = 32). However, our attacks show that the theoretical reasoning for the bit 
security in the NIST LwC document [6] is inaccurately mentioned. 
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4.2 Analysis of the Bound in [16] 


There are various provable security claims related to Beetle [6, 11,12, 15,16]. We 
do not consider the bound in [11,12] for the difference in the specification. 

For Photon-Beetle, we write the combined AE advantage as AdVphoton-Beetle> 
which is the same as the case of combined AE notion defined in Sect. 2, except 
that the attacker has additional oracles to compute the forward and inverse 
directions of the permutation that is modeled as a public random permutation. 
In [16], improved provable security bounds of Photon-Beetle are presented. Corol- 
lary 1 in [16] claims that, in the combined AE notion, the success probability of 
the attacker for the case r = 128 is 


Atog Aroq  4boa qp 2Ga , 20a(7 + qp) 


Adv>y, < 
VPhoton-Beetle(A) > gc T Je a Je $ OK + QT 9b 
6oedp | 8TGp | ATG | Te t+ Gp , ATOpoa 3 
2b 2c Qb—T 2b 92c ? ( ) 


where 7 is the tag length, c is the capacity, r is the rate, b = r + c, k is the key 
length, qe is the number of encryption queries, gq is the number of decryption 
queries, ge is the total number of blocks in encryption queries, og is the total 
number of blocks in decryption queries, qp is the number of offline queries, and 
O = Oe + dg. 

When qp = 0 and qa = og = 0, the bound (3) is 


or 
AdvPhoton-Beetle(A) < Jr? 
i.e., it claims IND-CPA security up to ce = 2°, which is flawed as we show below. 
We note that the birthday forgery attack in Sect. 4.1 implies a distinguishing 
attack with a comparable complexity as follows: 


1. Let qe = 2°/?, and fix qe distinct nonces Nj,...,Nq., qe distinct AD 
Aj,..., Aq, with |.A;| = b. We also fix a message M with |M| = b. 

2. Make qe encryption queries (N1, A1, M),..-, (Ngc, Aq, M) and obtain 
(C1, Ti), seey (Gop Toe) where IC; | =b. 

3. If there exists (i, j) such that (C4, Ti) = (C4, T), then output 1 (real world). 
Otherwise, output 0 (ideal world). 


Since the b-bit state collision can be expected in the real world, the attacker finds 
(i, j) in Step 3 with a high probability. The attack makes qe = 2°/? encryption 
queries, no primitive query (qp = 0), and no decryption query (qa = oa = 0), 
violating the bound (3). 

In [16, Sect. 7.2], the following AE bound is claimed for r = 128: 


q 
AdvPhoton-Beetle(A) < 2 + 


(4) 


When qp = 0, the bound claims perfect security both in IND-CPA and INT- 
CTXT. Even the ideal AE scheme cannot have a perfect security bound in 
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authenticity, and our birthday forgery in Sect.4.1 invalidates the INT-CTXT 
claim, and the above distinguishing attack invalidates the IND-CPA claim. 
The bound (4) is obtained from the bound (3) by using the relation 


o S qp, (5) 


which is not the case in our attacks. We do not see how the relation (5) can be 
ensured, as our attacks demonstrate that there are attackers with qp = 0. 

We clarify that the ePrint version [15] of [16] addresses the issue in the 
bound (3) with the following revised bound for r = 128: 


8roqg 8b? q2ou 2 2a(20 + 
AvP hoton-Beetle(A) < ‘ + Z + 2 + a zg ( 4) 


= ge Qb+c 2K Or 2b 
+ dp i 6Teqp i 12rqp _Fet+ 4 | ArdpOd (6) 
9b "9b ' Qe! 9b : 92c ? 


i.e., the revised bound contains a term o?/2°. A full-state collision in encryption 
queries is covered in the analysis of [16], and the above attack no longer applies. 
The source of the gap seems to be an error in the final step of the proof in [16] to 
take the summation of various terms, where a term 202/2? has been somewhat 
missing. 

In the ePrint version [15, Sect. 7.3.1], a simplified bound is presented. For 
r = 128, the bound is 


2 py2 
ae dp 20 f 10b dp j 24rqp l 120qp 
Ad VPhoton-Beetle (4) < DK + 9r I 9b T Je | 2b 5 


(7) 


which is obtained from the bound (6) by using the relation (5). We do not have 
an attack for this, but we do not know its correctness, as there are attackers 
outside of the relation (5). 


On SCHWAEMM. A NIST LwC finalist Sparkle [10] adopts Beetle. More specif- 
ically, the AE member of Sparkle, SCHWAEMM, uses Beetle with minor modi- 
fications. The specification document [10] does not present security bounds of 
SCHWAEMM nor mention the relationship with the original bounds of Beetle. 
Thus our analysis above does not have any implications to SCHWAEMM beyond 
the fact that it is based on Beetle. Moreover, as with the case of Photon-Beetle, 
we do not negate the bit security claims of SCHWAEMM. 


4.3 Related-Key Attack 


We present an efficient forgery attack against Photon-Beetle for r = 128 in the 
related-key setting [7,9,26]. In this setting, we consider the security notion as in 
Sect. 2, where we additionally assume that the adversary can modify the secret 
key. The encryption oracle I.€x(-,-,-) takes (N, A, M) and returns (C,T) = 
IT.Ex(N,A,M). In the related-key setting, it additionally takes A € {0,1}*, 
where k is the bit length of the secret key K. The related-key encryption oracle 
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T' = Shuffle! (M/[1] @ C[1]) 


Fig. 3. The adversary makes a single encryption query with key K @ 1. This immedi- 
ately allows a forgery for empty message and AD. 


returns (C,T) = I.Exaa(N,A,M) for a query (A, N, A, M). The decryption 
oracle can also be defined to take additional input to modify the key, but we do 
not use this in our attack. 

Our attack goes as follows: 


1. Fix (A, N, A, M), where A = 1, N can be any nonce, A is empty, and M can 
be any message such that |M] > r. 

2. Make a related-key encryption query (A, N, A, M) and obtain (C,T). Let 
M[1] be the first r bits of M, and C[1] be the first r bits of C. 

3. Return (N, A’,C’,T”’) as the forgery, where A’ and C’ are empty, and T’ = 
Shuffle’ ‘(M[1] © C[1]). 


See Fig.3. We see that the encryption query with key K @1 simulates the 
process for the empty message and AD, and the forgery in Step 3 succeeds with 
probability 1. The attack makes one related-key encryption query, one decryption 
query, and the success probability is 1. 

We remark that the impact is limited, as the attack only forges the empty 
AD and message. We also remark that the security against related-key attacks 
is not included in the security requirements of NIST LwC, and is not claimed by 
the designers. However, this type of weakness is avoided, e.g., in SCHWAEMM. 


5 Conclusions 


We have investigated the provable security bounds in the specification docu- 
ments of two NIST LwC finalists, GIFT-COFB and Photon-Beetle, and reported 
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some attacks whose success probabilities are higher that what their bounds tell. 
We have also analyzed other bounds of Photon-Beetle shown in the subsequent 
papers and shown some attacks. As a side result, we presented a simple forgery 
attack against Photon-Beetle when r = 128. We remark that our attacks do not 
invalidate the claimed bit security levels of them, and the related-key security is 


not claimed by the designers. 
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A Specifications of GIFT-COFB and Photon-Beetle 


Algorithm GIFT-COFB-Ex(N, A, M) 


1. Y[0] © Ex(N), L< Trunen/2(¥[0]) 
2. (A[],..., Aļa])  paa(A) 

3. if M + «e then 

4. (M[I],..., M[m]) & paa(M) 

5. fori=1ltoa—-1 

6 Le2-L 

7 Xi] + Aļ] eG. Yii- 1] Lio? 

8. Y[i] + Er(X{i) 

9. if |A| mod n = 0 and A # € then L+} 3- L 
0. else L + 3°. L 

1. if M =e then L& 3°. L 

2. X[a] + Ala] ® G - Y [a — 1] @ L||0”/? 

3. Y[a] © Ex(X[a]) 

4. fori=1tom-1 

5 Le2-L 

6. C[i] +- Mi] @Y[i+a—-1] 

7 Xẹļli+a] + M[i]@G-Y[ita-le@eL|or” 
8 Y[i+a] + Ex(X[i+a)) 

9. if M Ac then 

20. if |M| mod n = 0 then L & 3- L 


21. else L¢ 3?-L 

22. Cim] M[m] 6Y[a+m-—]] 

23. X[a+m] +} Mim] eG- Y[a +m- 1] Ll0”? 
24. Yļ|a+ m] <+ Ex(X[a+ m]) 


25. CO + Truncim(C[1]]. - - ||C[m]) 
26. T + Trunc,(Y[a+m]) 

27. else C & e, T + Trunc, (Y [a]) 
28. return (C,T) 


Algorithm GIFT-COFB-Dx(N, A, C,T) 


29. 
30. 
31. 


SAAARHHESSHNANA WN H 


. Y[0] — Ex(N), L + Truncy/2(¥ [0]) 
. (A[1], ..-., A[a]) & paa(A) 
. if C Æ e then 


(C[1],---, C[e]) < paa(C) 
fori=l1toa-1 

L+2.L 

X[i] - Ali] G- Yli- 1] @ LI|o”/? 

Yf © Ex (Xi) 


. if |A| mod n = 0 and A # € then L3. L 
. else L + 3°- L 


if C =e then L + 3- L 


. X[a] + Ala] G- Y [a — 1] @ Lio"? 
. Y[a] + Ex(X[a]) 


fori=1toc-—1 
Le2-L 
Mii] ¢ Yfita-1])  Cf[i] 
Xļ[i +a] + M[] G- Yfi +a- 1] @ Lo"? 
Y[i +a] + Ex(Xļ[i + a]) 


. if C # e then 


if |C| mod n = 0 then 
Le-3-L 
M[c] + Y[a +c — 1] ® Cie] 
else 
L+ 3? L, d + |C| mod n 


Mid < Trunce (Y [a + €- 1] @ C[c])||10"-" + 


X[a +d] + M|] ®G-YJa +c- 1] @ Lo"? 
Y [a + c] + Ex (X[a + ¢]) 
M © Truncjc\(M[{I]|| --- || [e]) 
T’ + Trunc, (Y [a + c]) 
else M + €, T’ + Trunc,(Y[a]) 
if T’ =T then return M, else return L 


Fig. 4. Algorithms of GIFT-COFB [3, Fig. 2.3] 
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Fig. 5. GIFT-COFB. 


Algorithm Photon-Beetle-€[r] x (N, A, M) 


SNA TR we 


IVEN|K, Cee 
if (A=c)A(M =e) 

T + TAGi28(IV @ 1); return(e, T) 
co + ((M # €) A (r| Al)? 1:2:3: 4) 
cı + ((A # €) A (r| |[M))? 1:2:5: 6) 
if A#e 

IV + HASH, (IV, A, co) 

.if M fe 
(M[]J,...,M[m]) & M 
for i =1 tom 

(Y, Z) C= Photonzse (IV) 

(W, Cti) = p(Y, Mi) 

IVEeWw |Z 
IVeIV 9c 
C e (CHI .-- Cl) 

-TH TAGi28(IV) 
. return (C,T) 


Algorithm Photon-Beetle-D[r]«(N, A, C, T) 


LIVE NIK; Mee 

2. if (A=e)A(C =e) 

3. T* + TAGi28(IV © 1) 
4. return(T =T*)? e: 1 

5. co + ((C # €) A (r| |Al))? 1:2:3: 4) 
6. a + ((AFe)A(r| |Cl))? 1:2:5:6) 
7. if A Ze 

8. IV + HASH, (IV, A, co) 

9. fC že 

0. (C11, ...,Cim]) 4 C 

1. fori=1tom 

2 (Y, Z) £2 Photonase (IV) 

3. (W, ME) © p~ Y, Cli) 

4 IVewW]|z 

5 IVIV 9c 

6. M + (M[1]|| ..- || M[m]) 

7. T* TAG128 (IV) 

8. return (T = T*)\? M: 1 


Algorithm HASH, (IV, D, co) 


1 
2 
3 
4. 
5 
6 
7 


. D{ij|l ..- || D[d] 4 ozs,(D) 

. fori=l1tod 

(Y, Z) 8 Photonzse (IV) 
Ww +Y ẹ@ Dii] 

vew]z 

. IVIVE 

. return IV 


Algorithm TAG- (T[0]) 


1. for i = 1 to [7/128] 

2. T[i] + Photones6(T [i — 1]) 

3. Te Trunci2s(T[1]) || ore Il Truncı28(T[T/128]) 
4. return T 


Algorithm p(S,U) 


1 
2 
3 


- V & Truncjy|(Shuffle(S)) & U 
. S&S @ ozs, (U) 
. return (S, V) 


Algorithm p (9, V) 


1. U + Truncjy;(Shuffle(S)) © V 
2. SS @ozs,(U) 
3. return (S,U) 


Fig. 6. Algorithms of Photon-Beetle [6, Fig. 3.6] 
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Abstract. Zero-effort deauthentication (ZED) aims to log out a user from a 
computer terminal, if not in use, without any user intervention. A representative 
instance of such an approach is ZEBRA, which makes use of a wrist-worn wear- 
able device (e.g., smartwatch) to deauthenticate the user if the activities on the 
computer terminal (e.g., typing) do not match with the user’s wrist movements. 
In this paper, we present VibRaze (VibRaze stands for Vibration-enabled 
Relay Attacks on Zero-EffoRt deauthenticaton.), a new class of potentially devas- 
tating relay attacks against ZED (specifically, a prominent ZED instance ZEBRA) 
based on the ubiquitous and inconspicuous vibration capability of the underlying 
wrist-wearable. Since merely launching a ghost-and-leech relay attack against 
these schemes is not going to bypass their security, VibRaze additionally creates 
vibrations on the wrist-wearable remotely (e.g., through a phone call) while the 
attacker attempts to defeat the deauthentication functionality of the ZEBRA sys- 
tem. This serves to defeat ZEBRA since the vibration-triggered movements at the 
wrist-wearable highly correlate with the typing events at the terminal. We design 
and evaluate VibRaze against ZEBRA’s machine learning design demonstrating 
that it can allow the attacker to remain logged into the terminal and perform 
typing activity at will, while the user remains oblivious to the ongoing attack. 
VibRaze represents a significantly powerful and challenging threat to address due 
to its remote and inconspicuous nature. Nevertheless, we provide some potential 
mitigation strategies that may be used to reduce the impact of VibRaze. 


1 Introduction 


Any usable authentication system should consist of a deauthentication mechanism, i.e., 
a means for promptly detecting when to log out a previously authenticated user from 
an ongoing session at a computer terminal. To improve the usability of deauthenti- 
cation, it is crucial to make it oblivious to users by eliminating the cognitive effort 
required of them. Although such zero-effort deauthentication (ZED) schemes are com- 
pelling, designing them correctly can be a challenge in practice given the obvious ten- 
sion between the underlying usability and security requirements. 

A concrete representative ZED approrach is ZEBRA, a zero-effort bilateral deau- 
thentication method, proposed by Mare et al. [21]. ZEBRA is geared for scenarios 
where users authenticate to computer terminals (such as desktop computers in a collabo- 
rative setting). In such scenarios, users typically have to either manually deauthenticate 
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themselves by logging out or locking the terminal, or the terminal can deauthenticate 
a user automatically after a sufficiently long period of inactivity. The former approach 
requires explicit user effort while the latter approach reduces promptness of log out. 
ZEBRA makes the process of deauthentication both prompt and transparent: once a 
user is authenticated to a terminal (using say a password), it continuously, yet trans- 
parently re-authenticates the user so that prompt deauthentication is possible without 
explicit user action. In ZEBRA, the user is required to wear a device on her wrist (e.g., 
a smartwatch or bracelet) equipped with motion sensors on his mouse-holding hand. 
The bracelet is wirelessly connected and paired to the terminal, which compares the 
sequence of events it observes (e.g., keyboard/mouse interactions) with the sequence 
of events predicted using measurements from the device’s motion sensors. The logged- 
in user is deauthenticated when the two sequences no longer match. The application 
scenarios for ZEBRA can very well extend beyond the shared-space setting and may 
include (de)authentication to a personal computer or a laptop, or even a mobile phone, 
like to enhance the security of Google Smart Lock [9]. 

In this paper, we present a new class of potentially devastating relay attack 
against ZED (specifically a prominent ZED instance ZEBRA) based on the ubiqui- 
tous and inconspicuous vibration capability of the underlying wrist-worn wearable 
device. VibRaze involves mounting a ghost-and-leech relay attack, and crucially cre- 
ating vibrations on the wearable device of the user located remotely (e.g., through a 
phone call or notifications) (can be visualized in Fig. 1b) The leech is installed at the 
remote location where the user’s wearable device resides (e.g., home, office, or cafete- 
ria), and the ghost is installed near the terminal. Once the ghost and leech pairs have 
been installed, the attacker does not need to be in close proximity of the remote user. 
To break the security of ZEBRA, while creating vibrations on the wearable device, the 
attacker performs typing activity at will on the terminal unbeknownst to the victim. The 
vibration functionality serves to defeat ZEBRA since vibration creates subtle move- 
ments on the device that match the characteristics of a typing activity being applied at 
the terminal. 

Ghost-and-leech relay attacks have already been demonstrated to be practical for 
various short range wireless communication technologies like Bluetooth [18], RFID [7, 
14] and NFC [8], making this vulnerability a serious threat. We also implemented and 
tested the ghost-and-leech relay attack over the Bluetooth channel, the wireless medium 
used in ZEBRA. However, such standard relay attacks do not work to defeat ZEBRA 
because ZEBRA involves correlating user’s hand movements captured by the wearable 
with activities on the terminal (as shown in Fig. la). Therefore, simply relaying the 
wireless communication between two end points as in standard attacks is not sufficient 
to bypass the security of ZEBRA. We address this challenge by introducing the notion 
of vibration-based relay attacks. 

Since the attack is triggered by a simple and random spam call/notification, the user 
remains unaware of the ongoing attack. For instance, a call ring/vibration typically lasts 
for roughly 20s (if the call is not picked up), during which an attacker can type in 
60-65 characters (given the typing speed of an average person is 190-200 characters 
per minute [20]). Even with these limited characters, the attacker can swiftly perform 
various nefarious activities on the terminal, such as deleting important files, uploading 
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private files to the attacker’s server, changing dosage and making fake prescriptions in 
a hospital setting, and leave to evade the detection. Due to the zero-effort nature of 
the underlying schemes, the user can not determine if the spam call/notification was 
received as part of attacking these schemes. On the contrary, if the user is required to 
be diligently present in the loop to detect such attacks, the schemes will no longer be 
zero-effort. 


Our Contributions: We believe that our work makes the following novel contributions: 


1. A Fundamental New Vulnerability in Zero-Effort Deauthentication (ZED): We 
introduce VibRaze, a new vulnerability associated with ZED based on bilateral activ- 
ity correlations. It takes the form of a standard ghost and leech relay attack aug- 
mented with remote vibrations triggered with simple phone calls/notifications that 
allows the attacker to remain logged into an authentication terminal. 

2. VibRaze Design based on Vibration Triggers: We design VibRaze as a signif- 
icant extension to the standard relay attacks based on two types of remote vibra- 
tion triggers — Call-Vib, the vibration created on the watch/bracelet when there is a 
call on the phone (a companion device of the watch), and Notif-Vib, the vibration 
generated on the watch when there is a notification (e.g., a message, an email) for 
the user. We observed that the motion sensor readings associated with the vibra- 
tion mostly (88.31%) match with those of typing activity, which enables VibRaze 
to launch a relay attack enhanced with vibration against ZEBRA when the user is 
located remotely doing other activities. 

3. Evaluation of VibRaze: We evaluate VibRaze against ZEBRA’s machine learning 
design demonstrating that it can allow the attacker to remain logged into the terminal 
(with 100% of probability in most of the cases) and perform typing activity at will, 
during which it can perform malicious activities on the terminal (such as deleting 
important files, uploading private files to the attacker’s server, or changing dosage 
and writing new prescriptions in a hospital setting). Since the attack is triggered by 
a simple spam call/notification, the user remains unaware of the ongoing attack. 


The rest of the paper is organized as follows. In Sect. 2, we review ZEBRA, proxim- 
ity attacks against it, and recently proposed defensive system. In Sect. 3, we introduce 
our remote attack system, VibRaze, against ZEBRA, followed by Sect.4, where we 
provide details on the design of VibRaze. Next, in Sect.5, we present data collection 
procedures. We evaluate VibRaze against ZEBRA in Sect. 6. Finally, in Sect. 7, we dis- 
cuss potential mitigation strategies against VibRaze, review related literatures in Sect. 8, 
and conclude our work in Sect. 9. 


2 Background: ZEBRA Review 


In ZEBRA, the user, once authenticated, is continuously and transparently re- 
authenticated making the deauthentication process prompt without any explicit user 
interaction. 


System Architecture: ZEBRA considers a terminal with keyboard and mouse, and a 
watch equipped with motion sensors (i.e., accelerometer and gyroscope). The readings 
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recorded by the watch’s embedded sensors capture the wrist movement of the wearer. 
The terminal keeps track of the watch associated with each authorized user. Once a 
user is authenticated to the terminal, ZEBRA continuously re-authenticates the user 
by correlating his interactions at the terminal with the interactions predicted based on 
the motion signal captured by the watch. If they do not correlate, ZEBRA promptly 
deauthenticates the current user. 

ZEBRA considers three types of interactions: typing on the keyboard, mouse 
scrolling, and keyboard-to-mouse and mouse-to-keyboard hand movements (termed as 
“MKKM”). When comparing interactions, ZEBRA considers three parameters — win- 
dow size (w), threshold (m), and grace period (g). ZEBRA compares a sequence of w 
interactions (a window) at a time. A window is marked ‘1’ if the fraction of matching 
interactions exceeds a threshold m, otherwise, it is marked ‘0’. If ZEBRA marks ‘0’ for 
‘g? consecutive windows, it outputs “different” and instantly deauthenticate the user. 

In ZEBRA, the wrist-wearable can be a wrist-band, such as those from Fitbit [6] and 
Xiaomi [28], or a general-purpose smartwatch, such as LG G Watch R that we have used 
in our implementation. Typically, being constrained devices, these wrist-bands need, 
and always remain connected with a companion device, the smartphone in particular, for 
its proper functionality. Similar to the smartwatch, these wrist-bands offer a vibration 
feature to alert the user when the user’s phone rings or gets notifications. 


Adversary Model: ZEBRA considers the threat of unauthorized access to the termi- 
nal when a user leaves the terminal without logging out and remains in its proximity 
doing other tasks. Specifically, ZEBRA considers two types of adversaries. First, an 
innocent adversary, a user who uses an unattended terminal for his own purposes with- 
out realizing that another user (‘victim’) is logged in or because she does not want to 
go through the login process. Second, a malicious adversary who deliberately uses the 
already logged-in terminal with the intent to perform some action impersonating the 
victim. The malicious individual may observe and mimic the actions of the victim user 
using another nearby terminal to fool the terminal into falsely authenticating himself as 
the victim user. Mare et al. [21] have demonstrated that their system was able to detect 
and deauthenticate both innocent and malicious adversaries in a reasonable time while 
maintaining low false negative rates. 


3 Overview and Threat Model 


VibRaze represents a new threat, potentially a devastating one, to ZEBRA that can 
compromise the system’s security when victim user is located remotely, far away from 
the authentication terminal in question. VibRaze involves mounting a ghost-and-leech 
relay attack, and creating vibrations on the wearable device of the user located remotely 
(e.g., through a phone call/notifications) while the attacker performs the typing activ- 
ity on the terminal unbeknownst to the victim. VibRaze assumes a malicious adver- 
sary who attempts to access an already logged-in terminal (similar to ZEBRA) or who 
has compromized the user’s credentials (via phishing attacks, password databases leak- 
age or other mechanisms) and attempts to access the terminal on behalf of the user. 
Specifically, to compromise the security of the ZEBRA system, VibRaze follows below- 
mentioned steps. Figure | presents the visualization of VibRaze. 
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(b) VibRaze: A ghost-and-leech relay attack enhanced with remote vibration. This attack 
can succeed to defeat ZEBRA as vibrational motions on the wearable device match with 
keyboard activity on most occasions. 


Fig. 1. VibRaze Attack Model with and without vibration. “KB-only Attack”: the attacker inter- 
acts with victim terminal using only keyboard. 


(1) Mounting Ghost-and-Leech Attack: To defeat ZEBRA, VibRaze mounts a ghost- 
and-leech relay attack that enables the watch to remain connected with the terminal 
even after the wearable goes out of the terminal’s proximity. Ghost-and-leech relay 
attacks have already been demonstrated to be practical for various short-range wireless 
communication technologies like Bluetooth [18], RFID [7,14] and NFC [8]. VibRaze 
can employ a ghost-and-leech attack following the approach similar to any of these 
works. In fact, we implemented and tested the ghost-and-leech relay attack on Bluetooth 
channel, the wireless medium utilized in ZEBRA, as described in Sect. 4.2. In VibRaze, 
an adversary implants a ghost near the terminal, and a leech at the location where the 
watch of the user has a high chance to be present, e.g., at home or on a car (such 
as hidden in the exterior of the car somewhere). The leech emulates the terminal and 
receives the stream of motion sensor data from the watch and relays it to the ghost. 
The ghost emulates the watch and transfers the stream of motion data received from the 
leech to the terminal. Thus, launching the relay attack enables the terminal to remain 
connected to the watch and allows it to continuously receive the motion sensor data from 
the watch. Figure la shows this plain ghost-and-leech relay attack against ZEBRA. 
When the adversary launches the relay attack, the user at the remote location may 
perform various wrist activities that are different from the user-terminal interactions. 
To continuously and transparently re-authenticate the user, ZEBRA correlates the wrist 
movements with the terminal activities. Mare et al. have demonstrated that ZEBRA can 
correctly detect the non-terminal activities with high accuracy (as shown in Figs. 5 and 
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6 in [21]), and deauthenticate the current user in reasonable time. Therefore, the plain 
ghost-and-leech relay attack alone is not sufficient for the adversary to compromise 
ZEBRA. 


(2) Creating Remote Vibration Triggers: As launching a plain ghost-and-leech relay 
attack alone is not sufficient to break the ZEBRA system, for the attacker to fool the 
ZEBRA into logging him on behalf of the victim user, an attacker needs an additional 
step of vibrating the watch to make the motion signal match the typing activity at the 
terminal. Figure 1b shows the VibRaze attack consisting of the ghost-and-leech relay 
attack enhanced with remote vibrations. The attacker can make the user’s watch to 
vibrate through spam calls, sending messages through one of the messaging applica- 
tions (e.g., default text messaging app, Viber, Facebook Messenger), spam emails, etc. 
These spam calls, messages, and emails are originally sent to the user’s phone, the 
companion device for the watch, which in turn creates a vibration on the connected 
watch for notifying the user. The attacker can obtain the contact details of the victim 
user through various approaches such as phishing attacks, leaked databases, or other 
mechanisms [10, 15—-17,24,25,27]. We note that VibRaze does not need the attacker 
to continuously ring/vibrate the victim’s watch. The attacker can devise a strategy to 
periodically and randomly ring the watch such that it does not alert the victim while he 
can accomplish his intended task. In fact, VibRaze can be launched for a duration of a 
single call (or a couple of messages), and wait for a certain time, perhaps for a suffi- 
ciently long time, and repeat the attack, thereby making it inconspicuous to the victim 
user. One may assume that the victim can block a phone number if he receives spam 
calls/messages from it. However, the attacker can use different phone numbers over the 
duration of the attack to generate spam calls and messages. 

We note that the first step in VibRaze, i.e., mounting the relay attack, is performed 
only once while the second step of creating a vibration on the user’s watch can be per- 
formed multiple times as per attacker’s choice such that the undergoing attack remains 
oblivious to the victim user. Further, VibRaze considers a keyboard-only attack where 
the attacker interacts with the terminal using only the keyboard at the time when there is 
a vibration on the user’s watch due to an ongoing call or a message notification. Since 
motion signals associated with vibrations are mostly classified to typing interaction 
(explained next), employing a keyboard-only attack enables the attacker to successfully 
compromise the ZEBRA system. 


Why Vibration?: ZEBRA considers three types of interactions — typing, scrolling, and 
MKKM. When a motion segment corresponding to an interaction is fed to Interac- 
tion Classifier, it is always classified to one of the classes/interactions. Therefore, a 
keyboard-only attacker can succeed to fool the ZEBRA into logging him as a legitimate 
user if he can enforce the user’s watch to create a motion that is close to the typing 
interaction. In VibRaze, the attacker utilizes the vibrator motor available on the watch 
to create such motion on the watch. Specifically, as mentioned earlier, the attacker 
makes the user’s watch to vibrate by making a spam phone call or sending a bunch of 
messages. When the watch vibrates, it creates certain motion on the device, which when 
fed to ZEBRA, it gets mapped to one of the interactions considered in ZEBRA (mostly 
to typing as detailed below). 
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To find out what class the motion corresponding to vibration is classified to by 
the Interaction Classifier of ZEBRA, we mismatched all the actual interaction from 45 
user samples (collected in our study as explained in Sect. 5) with the vibration-triggered 
motion sample. While mismatching, the starting point of the vibration-triggered motion 
sensor sample was aligned with the starting point of interaction samples. Since we 
are interested in the keyboard-only attack, we removed all the mouse-related interac- 
tions (scrolling and MKKM) from the actual interaction sequence during mismatch- 
ing. Vibration motion data was then segmented into blocks based on the timestamps of 
actual interactions, features were extracted from each block, and were fed to already 
trained Interaction Classifier. The result shows that most (88.31%, 8687/9836) of the 
vibration motion sensor data are classified as typing, and rest (11.68%, 1151/8685) are 
classified as scrolling. Since, for Interaction Classifier, vibration on the watch closely 
matches with the typing, enforcing the user’s watch to vibrate can enable VibRaze to 
compromise the security of ZEBRA. 

Further, the user study of [22] with 113 Amazon Mechanical Turk workers has 
shown that the majority of the users keep their smartphones either in ringer or vibration 
mode most of the time at home and while asleep. We note that ringer mode generally 
includes vibration on it. We believe that similar results would apply in the case of smart- 
watches. Since the majority of smartwatches (e.g., LG G Watch R, Sony Smartwatch 3) 
do not come with built-in speakers that leave the user with only two options, either to set 
the watch in vibration mode or in silent mode. As in the case of the phone, it is reason- 
able to assume that the majority of users would keep their watches in vibration mode, 
the default mode. Given that the users would mostly keep their watches in the vibration 
mode, and the motion segments corresponding to vibration are mostly mapped to typ- 
ing in ZEBRA, we design VibRaze by carefully considering different vibration patterns 
such that they match with the typing activities. 


Broader Implications — Extension to VibRaze: VibRaze can further be extended to 
launch an attack against various other security schemes, particularly against authoriza- 
tion schemes based on motion sensors. For instance, the vibration-based attack notion 
underlying VibRaze may be applied to defeat OpenSesame [29], a lock/unlock sys- 
tem for smartphones based on the hand waving pattern of the users. VibRaze may be 
designed to generate the vibration pattern on the phone that matches the hand waving 
pattern of the user, thereby fooling the system to unlock the device. The accelerometer- 
based tapping mechanism proposed in Tap-Wave-Rub [19] geared for NFC applications 
may also be defeated by carefully updating VibRaze. The vibration pattern generated 
in such a modified VibRaze may match with the tapping gesture of the user. Several 
other schemes that rely on motion sensors may also be vulnerable to VibRaze. Further 
investigation is needed to measure the extent that the vibration pattern matches with 
gesture activities used in security schemes. 


4 Design and Implementation 


4.1 Implementation of ZEBRA 


Software: We designed and developed two applications — Wear-app (Android), and 
Desktop-app (Java). Wear-app runs on the LG watch and captures the wearer’s wrist 
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motion measurements while Desktop-app runs on the terminal and captures the actual 
keyboard-mouse interaction observed on the terminal. The watch synchronizes its clock 
with the terminal and transmits the motion sensor data to the terminal at the sampling 
rate 200 Hz through Bluetooth. We utilized MATLAB to implement the rest of the com- 
ponents of ZEBRA with the functionalities as described in [21]. 


Feature Set and Classifier: Similar to [21], we used the same set of 12 features from 
each of the accelerometer and gyroscope. The list of features is shown in Table 1. We 
also used a RandomForest classifier with 100 weak-learners. Each weak-learners con- 
sider sqrt(n) features, where ‘n’ (=24) is the total number of features. All the classes 
under consideration were weighted to account for any imbalances in the training dataset. 
Further, the exact parameters (as shown in Table 2) as provided in [21] were used in our 
implementation. 


Table 1. List of features used in our implementation. 


Feature Description 

Mean mean value of signal 
Median median value of signal 
Variance variance of signal 


Standard Deviation | standard deviation of signal 


MAD median absolute deviation 
IQR inter-quartile range 
Power power of signal 

Energy energy of signal 
Peak-to-peak peak-to-peak amplitude 
Autocorrelation similarity of signal 
Kurtosis peakedness of signal 
Skewness asymmetry of signal 


Table 2. Parameters and their values used in our implementation of ZEBRA. 


Parameter Value Parameter Value 
Minimum duration |25 ms | | Window size (w) 5-30 
Maximum duration” | 1 s Match threshold (m)| 50-70 % 
Idle threshold” Is Overlap fraction (f) 
Grace period (g) l2 


e 


“For MKKM, idle threshold and maximum duration is 5s. 


4.2 Implementation of Relay Attack 


As a prerequisite to launch the attack, we implemented and tested a ghost-and-leech 
relay attack against the Bluetooth channel (the wireless channel used in ZEBRA) uti- 
lizing Python-Bluez and socket libraries. The relay attack consists of two attacking 
devices — ghost (or the attacker’s watch G) and leech (or the attacker’s terminal £), 
each with a reprogrammable Bluetooth device. They communicate with each other 
through a network connection. We developed two python scripts using Python-Bluez 
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and socket libraries for these two attacking devices. Unlike the attacking devices, the 
terminal (J) and the watch (W) have fixed mac addresses. The ghost device (G) and 
the leech device (£) clone the mac addresses of original W and 7, respectively. With 
this setup, G appears to be W to T while £ appears to be 7 to W. Initially, G and £ 
establish a network socket connection to forward messages between them. Once they 
are connected, £ starts the Bluetooth Socket server to accept the incoming connection 
request from W, and G establishes a Bluetooth connection to 7. Upon establishing a 
Bluetooth connection between G—T and L-W, all the messages including challenge- 
response messages and motion sensor measurements are relayed through G and £ from 
W to T (and vice-versa). Thus, the relay attack was performed successfully. Since the 
ghost-and-leech attack can be launched stealthily installing G and £ devices at their 
respective locations and following the approaches similar to [7,8, 14, 18], or our imple- 
mentation as described earlier, in our study, we mainly focus on designing and analyzing 
the additional step of creating the remote vibrations on victim’s watch. 


4.3 Design of VibRaze’s Attack Scenarios 


To evaluate the effectiveness of VibRaze against ZEBRA, we consider various scenarios 
based on the vibration type and the watch position as described below. 


Vibration Type: We consider two types of vibrations on the user’s watch — Call-Vib and 
Notif-Vib. Call-Vib is the vibration generated on the watch due to a call. In Call-Vib, the 
watch vibrates for approximately 20 s, assuming the call is not picked up, or is dropped. 
The watch may vibrate for a shorter period if the user picks up the call or disconnects 
it. However, an attacker can call the victim user multiple times after a certain time 
gap so that the attack remains oblivious to the user. Notif-Vib is the vibration created 
on the watch due to the notification of a text message, or an email. In Notif-Vib, the 
watch vibrates for a very short duration (approximately 500 ms). Similar to Call-Vib, 
the attacker can send multiple messages making the watch vibrate for a longer duration. 

To evaluate VibRaze against ZEBRA with these two vibration setups, we 
updated/programmed the Wear-app to create Call-Vib and Notif-Vib. Since the vibration 
on the watch when its companion device (i.e., the phone) rings, specifically Call-Vib, 
generally follows a vibration-pause-vibration pattern, we simulated Call-Vib with the 
duration of (1000-2000-1000) ms based on the vibration pattern generated on the LG 
G watch when its companion device rings. Similarly, we simulated Notif-Vib, i.e., the 
vibration generated when the watch receives a text message or an email with a short 
vibration of 500 ms. To simulate multiple calls, Wear-app plays the Call-Vib repeatedly 
with an inter-Call-Vib gap of (2000-3000) ms while it plays the Notif-Vib repeatedly 
with an inter-Notif-Vib gap of 1000 ms to simulate multiple text messages. 


Watch Position: We consider VibRaze with two settings — On-Wrist, where the watch 
is worn by the user, and Off-Wrist, where the watch is not worn. In the On-Wrist set- 
ting, we consider various scenarios based on the real-life activities of the users. For 
instance, we consider a driving scenario that reflects the attack setting where the victim 
user is driving his car, or riding a bus/train. We also consider a using-phone scenario, 
where the victim user is using his phone for various purposes such as for typing/reading 
a text/email, or surfing the internet. Next, we consider the writing scenario where the 
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user performs a hand-writing task. Further, we consider the attack setting with the typ- 
ing scenario where the user uses his personal computer or his laptop at home or other 
places. In daily life, the users should also perform walking activities so we consider the 
attack setting with the walking scenario. Moreover, we consider miscellaneous wrist- 
activities that represents an attack scenario where the user is resting by sitting on a 
sofa or a chair and moving his hand for random activities. Apart from these activities, 
users may do several other activities in day-to-day life. Covering all day-to-day real-life 
activities of the user to evaluate the performance of VibRaze is not possible in the study. 
However, we try to cover all the user’s activities by categorizing them into the follow- 
ing classes — driving, using phone, writing, typing and miscellaneous. “Miscellaneous” 
represents all the random activities that the user performs while sitting on a sofa or a 
chair. During all these activities, we assume that users are wearing the watch designated 
for authentication. 

While at home, users may take off their watches and keep them on the desk, or on 
other surface for charging or for other reasons. Users are most likely to take off their 
watches before they go to the bed. We refer to such a non-worn setting as Off-Wrist. 
Unlike a table/desk, some surfaces where users keep their watches may dampen the 
vibration. Therefore, we consider two types of surfaces — (a) a smooth surface that does 
not dampen the vibration, e.g., wooden table surface, and (b) a surface that can dampen 
the vibration, e.g. sofa, pillow, etc. 


5 Data Collection 


To evaluate the performance of our implementation of ZEBRA (in benign and adversar- 
ial settings) and that of VibRaze, we recruited 15 participants (mostly graduate students, 
18 - 35 years old, 11 males and 4 females). All participants were right-handed. Partic- 
ipants were told that the purpose of the study was to evaluate the feasibility of using 
wrist-motion while interacting with the terminal to authenticate the user. Before starting 
the experiment, they were asked about their general demographics. During the experi- 
ment, participants performed three 10-minute tasks of filling a web form similar to [21]. 
From each task, two sets of data were collected — (1) motion data, i.e., accelerometer and 
gyroscope sensor readings, from the user’s watch, and (ii) user’s activities on the termi- 
nal, i.e., actual interaction identified by Interaction Extractor on the terminal. The 15 
user sessions thus resulted in a total of 45 samples. All the experiments were conducted 
in the lab settings. Our experiment and the data collection followed the IRB procedures 
at our institution. 

To evaluate VibRaze against ZEBRA, we collected the motion sensor data for each 
of the attack settings detailed in Sect.4.3. For the Off-Wrist setting, the watch was 
placed on the top of three different surfaces — the surface of a wooden table, a soft 
pillow, and a sofa. Since, unlike the wooden table, the soft pillow and the sofa can 
dampen the intensity of the vibration on the watch, they emulate the scenario where 
the watch is placed on top of a vibration absorber. In our work, one of the researchers 
involved in the study played the role of the victim, and data was collected corresponding 
to this user in different scenarios of the On-Wrist setup. For the On-Wrist setting, motion 
sensor data was recorded when the victim was performing various regular activities 
while wearing the watch on his right hand. We collected motion data when the victim 
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was walking at his regular pace, and when he was writing some text from a random wiki 
link on a sheet of paper. Motion data were also collected when the victim was using his 
phone. On the phone, the victim performed the following tasks — typing the provided 
text, browsing Facebook/Instagram, and checking email. Further, motion sensor data 
was collected when the victim was filling a simple web-form similar to the one used 
earlier. Moreover, motion sensor data was collected when the victim was driving a car 
at a speed ranging between 20-60 mph. We also collected motion data when the victim 
was sitting on a chair and communicating with other people. In this setting, the victim 
moves his hand at random. We term the wrist-activity of this setting as “miscellaneous”. 
All these activities were performed for 10min. We note that only motion data was 
collected when the victim was performing these activities. In total, we collected 16 
motion samples, eight samples (3 samples with Off-Wrist and 5 with On-Wrist setup) 
for each of Call-Vib and Notif-Vib settings. This pool of motion samples represents 
various attack settings of VibRaze with different activities of the users, and the watch 
placement. 


6 Analysis and Results 


In this section, we evaluate the performance of our implementation of ZEBRA and that 
of VibRaze against ZEBRA in various scenarios. 


6.1 Performance of ZEBRA 
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Fig. 2. Legitimate users. 


Performance with Legitimate Users: To evaluate the performance of our implemen- 
tation of ZEBRA with legitimate users, we employ the same approach as in [21] on 
the data samples collected from 15 users. Specifically, we compute False Negative Rate 
(FNR) as the fraction of interaction windows from a user that the Authenticator out- 
puts incorrectly as from “different user”. Similarly, we employ the leave-one-out cross- 
validation approach — for a given user, we train the classifier using 42 data samples 
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collected from all the other 14 user sessions, and three data samples from the current 
user are used to test the model. Thus, we build 15 different classifiers and report the 
aggregate classification results of 45 samples. 

With our implementation of ZEBRA, we achieved FNRs in the range between 0- 
10% (as shown in Fig. 2a) which is in line with [21] (0-18)%. We achieved FNRs 
below 6% for window sizes above 15. Similar to [21], we fixed w = 20 and m = 60% to 
estimate the length of time (in terms of the number of windows) for which a legitimate 
user remained logged in. Figure 2b shows the fraction of users remaining logged in 
after ‘n’ authentication windows for a grace period (g) of 1 and 2. With g = 1, ZEBRA 
recognizes 89% of the users as a legitimate user for the entire session while with g = 2, 
this fraction increases to 94%. These results are in line with those reported in [21]. 


Table 3. Confusion matrix of Interaction Classifiers for 15 legitimate user samples. 


Predicted 
Scrolling] MKKM 
69 403 
1127 3 
36 5829 


Typing 
9662 
118 
622 


Typing 
Scrolling 
MKKM 


Actual 


Table 3 shows the confusion matrix of 15 Interaction Classifiers for classifica- 
tion performance. We achieved overall precision of 93.01%, recall of 93.00%, and F- 
Measure of 92.98%. These results are in line with those reported in [12,23] and show 
that similar to their classifiers, our classifiers are very good at correctly recognizing the 
interactions. 
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Fig. 3. Simulated innocent adversaries. 
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Security against Innocent Adversaries: To evaluate the security of our implementa- 
tion of ZEBRA against innocent adversaries, we compute True Negative Rate (TNR). 
Specifically, we compute TNR as the fraction of windows that Authenticator correctly 
outputs as from “different user”. We simulated the innocent adversary by mismatch- 
ing the sequences where an actual interaction sequence from one sample is compared 
against the predicted interaction sequence from a different sample. During this simu- 
lation, the sequences were synchronized by aligning the starting point of the sequence 
being compared. For the threshold of (60-70)%, most (>85%) of the authentication 
windows were correctly identified as mismatching windows for window size above 20 
(as shown in Fig. 3a). Figure 3b shows the fraction of innocent adversaries remaining 
logged in to ZEBRA system for a given number of authentication windows. As shown 
in the figure, all the innocent adversaries or “wrong” users were quickly deauthenticated 
within 5 authentication windows. This shows that our implementation of the ZEBRA 
system is robust against such innocent adversaries. Thus, it serves as a valid implemen- 
tation to test the effectiveness of the VibRaze attack. 


6.2 Performance of VibRaze Against ZEBRA 


To evaluate the performance of VibRaze against ZEBRA, we consider the interaction 
sequence from 45 samples collected from 15 user sessions as the attack-sample and the 
motion data collected from the victim (or the researcher) with different attack settings as 
a victim-sample. Considering the regular user’s interaction sample as an attack-sample 
reflects that a non-expert individual is playing the role of an attacker who attacks the vic- 
tim at a random point in time, i.e., non-opportunistic attack. Further, taking the motion 
data from different attack settings as a victim-sample indicates that the victim user 
is performing various regular activities, or has placed his watch on different surfaces. 
Since VibRaze considers a keyboard-only attack, we discard all the mouse-related inter- 
actions including MKKM from each of the attack-samples. In our attack analysis, we 
map the interaction sequences from each of the attack-samples with the motion signals 
from the victim-sample by aligning the starting point of these two samples. This map- 
ping represents the scenario where an attacker attempts to access the already logged-in 
terminal when the victim user is performing his routine activities. Below we present the 
performance of VibRaze against ZEBRA in different On-Wrist and Off-Wrist settings. 


On-Wrist Setting: Figure 4 shows the fraction of attackers remaining logged in to 
ZEBRA for given ‘n’ authentication windows with the On-Wrist setting. Figure 4a 
shows this result with Call-Vib setting when the user is executing five different activi- 
ties When the victim was driving his car, all the attackers were incorrectly recognized 
as legitimate users and were able to remain logged in for the entire experiment session 
with the grace period (g) of both 1 and 2. Similar results were achieved when the victim 
was using his phone (typing, browsing, playing game), and writing some random text 
on a sheet of paper while wearing the watch on his wrist. With the user’s typing activ- 
ities at the remote location, 98% of the attackers succeeded to remain logged in when 
using g = 2 while when using g = 1, 72% of attackers succeeded. Further, with the reg- 
ular wrist movement while sitting on a chair or a sofa, nearly 75% of the attackers were 
able to remain logged in on behalf of the user when using g = 2. Even with the strict 
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Fig. 4. On-Wrist Setting. Fraction of attackers remaining logged in to ZEBRA after ‘n’ authen- 
tication windows (with w = 20, m = 60%) when user is undergoing different wrist-activities. 


grace period of 1, more than 40% of the attackers were able to remain logged in for the 
entire session of the experiment. 

Figure 4b shows the fraction of attackers remaining logged in for given ‘n’ authen- 
tication windows with the Notif-Vib setting. 100% of attackers were able to remain 
logged in when the victim was performing the following three activities — driving, using 
the phone, and writing on a paper — at the remote location. With the typing activities of 
the user, 96% of the attackers succeeded to remain logged in when using g = 2 while 
68% succeeded when using g = 1. Compared to the Call-Vib setting, when the user was 
executing regular wrist activities, a larger fraction (90%) of the attackers were able to 
remain logged in for the entire session when using g = 2. With g = 1, nearly 50% of 
the attackers remained logged in for the entire session. We also tested VibRaze with 
the walking scenario. However, the attack in this setting did not succeed potentially 
because the wrist-motion generated while walking may have dominated the vibration 
motion on the watch. Fortunately, if the VibRaze attack does not succeed at a given 
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time, the attacker can wait and repeat the attack at a later point of time when the victim 
user may be performing other activities. 

Thus, these results indicate that in the On-Wrist setup, VibRaze can completely 
compromise (i.e., 100% attackers can remain logged in) the security of ZEBRA in most 
of the scenarios such as driving, using the phone, and writing. 
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Fig. 5. Off-Wrist Settings. Fraction of attackers remaining logged in to ZEBRA after ‘n’ authen- 
tication windows (with w = 20, m = 60%) when watch is placed on different surfaces. 


Off-Wrist Setting: Figure 5 shows the fraction of attackers who succeeded to remain 
logged in to ZEBRA for ‘n’ authentication windows with the Off-Wrist setting. As 
shown in Fig. 5a, with Call-Vib and watch on wooden surface setup, all the attackers 
were incorrectly recognized as legitimate users and were able to remain logged in for 
the entire session at grace period (g) of both 1 and 2. When the watch was placed on the 
top of a pillow or on a sofa, we found that the attack success rate decreases potentially 
because of the vibration absorbent nature of these surfaces. With the pillow setup, the 
fraction of attackers who succeeded to remained logged decreases slightly to 95% with 
g = 2. This fraction further decreases to 55% with the sofa setup. When the grace period 
g = 1, 65% of the attackers succeeded to remain logged in with the pillow setup, and 
only 15% with the sofa setup. 
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In line with the Call-Vib setup, Fig. 5b shows the attack success rate with Notif-Vib. 
Notif-Vib also resulted in a 100% attack success rate with the watch on the wooden 
surface setting similar to Call-Vib. With Notif-Vib, the attack success rate increases 
significantly for the watch on the pillow and on the sofa setup. For both setups, 98% of 
the attackers were able to remain logged in for the entire session with the grace period 
of 1. With the grace period of 2, all the attackers stayed logged in for the whole session. 
We attribute this significant increase in attack success rate in Notif-Vib setup from that 
with Call-Vib setup to the short vibration gap in the Notif-Vib. 

Similar to the On-Wrist setup, our results show that VibRaze can break the ZEBRA 
system with a high success rate in Off-Wrist setup, especially with Notif-Vib. 


Summary of Results: Our results show that VibRaze can compromise the security of 
ZEBRA enabling a larger fraction (nearly 100%) of the attackers to remain logged in 
on behalf of the victim user in most of the setups. Under the On-Wrist setup, our result 
indicates that ZEBRA is highly susceptible to our remote attack in the scenarios when 
the victim user is driving, using his phone, and writing — nearly 100% of the attackers 
succeeded to impersonate the victim user in such scenarios. Even in the scenario where 
the victim user performs random wrist-activities (miscellaneous) while sitting on a chair 
or a sofa, 40-90% of the attackers were able to remain logged in. Interestingly, VibRaze 
did not succeed with the walking activities. However, we note that in VibRaze, the 
attacker can always wait for another time or another day to launch the attack when 
the victim might be undergoing a non-walking activity. Although in our evaluation with 
On-Wrist setup, we use the victim sample from only one user, these results should apply 
to different users since vibration seems to dominate the motion signals corresponding 
to other activities in this setting. When considering the Off- Wrist settings, VibRaze still 
succeeds to launch the remote attack — nearly 100% of attackers succeeded to launch 
the attack, especially with Notif-Vib. As the Off-Wrist settings are independent of the 
user’s activities, this result applies to all the users. 

The design of ZEBRA is not limited to the shared terminal scenario such as in a 
hospital scenario as presented in ZEBRA [21]. It can be well extended to lock-unlock 
a personal computer, a laptop, or even a phone similar to other zero-interaction authen- 
tication schemes. Even with such extensions of ZEBRA, VibRaze can still break the 
scheme. For instance, considering the use case of ZEBRA to lock-unlock a home com- 
puter, VibRaze can be launched while the victim is in the office. For this relay attack to 
work, the leech can be installed stealthily at the office location and the ghost near the 
home computer, and the vibration triggers can be sent the same way as we demonstrated 
in this section. 


7 Potential Mitigations 


In this section, we discuss various potential technical mitigation strategies against 
VibRaze and their limitations. We note that designing mitigation strategies and eval- 
uating their effectiveness against VibRaze is beyond the scope of our study. 


Disabling the Apps during Vibration: A natural, and perhaps a non-technical, defense 
against our attacks would be to disable the ZEBRA system in the scenario when a call or 
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a notification is received. Alternatively, the calls, notifications could be disabled when 
the ZEBRA system is running. However, such mitigation will prevent the user from 
receiving calls/notifications while interacting with ZEBRA implemented system, and 
could possibly degrade the usability of the wearable system. 


Re-Design of Classifier: In ZEBRA, Interaction Classifier is trained in such a way 
that it always classifies the motion segment to one of the three interactions — typing, 
scrolling, and MKKM, even if the motion segment corresponds to a different activ- 
ity. The consideration of only three interactions in ZEBRA could be the reason behind 
the success of VibRaze. Therefore, one potential mitigation strategy to defeat VibRaze 
may be the addition of a fourth interaction class (“other”) in the Interaction Classi- 
fier. The “other” class will represent the activities other than the already considered 
interactions, including the ones corresponding to vibration. The addition of the fourth 
interaction may defeat the VibRaze attack since the motion associated with the vibra- 
tion and any other activities now will be classified as “other”. However, the additional 
class for classification may degrade the performance of the classifier, and hence the 
usability of ZEBRA. Further investigation is needed to explore the effect of increasing 
the classes/interactions on ZEBRA, and its performance against VibRaze. 


Distance Bounding Protocols: Several approaches exist to bound the distance between 
two devices that can thwart the relay attacks, and eventually VibRaze. Such distance 
bounding protocols can utilize Received Signal Strength (RSS) [4], and Time-of-the- 
Flight (ToF) for distance estimation between two devices. RSS is a measurement that 
shows the strength of the radio signal received by the device. Since the strength of the 
radio signal decreases over the distance from its source, the variation of RSS measure- 
ment can be used to estimate the distance. However, the signal strength of the radio 
signal can be manipulated using a signal amplifier or an attenuator [1]. Therefore, RSS 
is not a reliable and secure method for distance estimation. ToF based protocols employ 
the time elapsed, either Time-of-Arrival (ToA) [11] or Round-Trip-Time (RTT), during 
a message exchange to estimate the distance. Since ToA uses only the propagation time 
of single message exchange, it requires two devices to share a synchronized and high- 
precision clock. This requirement makes ToA infeasible to implement in the real-world 
scenario where devices would have different clocks. RTT measures the time elapsed 
during message exchange, i.e., the time between the transmission of a message to the 
reception of its response. However, a small error on timing measurement (estimating 
processing or transmission delay) at one node can result in a large deviation on distance 
estimation [3]. 


Attack Detection via Logs: The ZEBRA system may be designed to keep the log of 
users accessing the shared terminal. By reviewing the recorded logs, the victim user may 
find out that his system has been compromised, and someone else has used his terminal 
on his behalf. Further, the VibRaze attack may leave long term traces (e.g., messages, 
call logs) on the companion device, the phone, that can be correlated with the attack 
based on their time. However, by the time users realize that the system has been com- 
promised, the attacker might have already fulfilled his malicious intention. Moreover, 
users might not be concerned about the security or be diligent enough to review the 
logs carefully and frequently. Several research studies (e.g., [5,26]) on user-centered 
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security have shown that the users may not pay attention to security notifications or 
heed to security warnings and messages. Moreover, relying upon the users to detect 
such attacks will break the zero-effort property of ZEBRA. 


8 Related Work 


One work that is closely related to our VibRaze attack is Sound-Danger [22]. Sound- 
Danger is an attack system against Sound-Proof [13], a zero-effort two-factor authen- 
tication system that leverages ambient sounds to detect the proximity between the 
second-factor device (phone) and the login terminal (browser). Similar to VibRaze, 
Sound-Danger attack system is based on making a phone call and sending a message 
to create predictable or previously known sounds. However, the design of VibRaze is 
completely different from Sound-Danger. Sound-Danger is based on the creation of 
predictable sounds on the phone while VibRaze is based on creation of vibration on 
the watch to match the wrist-motion of the user with typing. Further, Sound-Danger 
relies on the predictable sound creation on the phone while VibRaze relies on vibration 
creation on the watch. 

Similar to ZEBRA, WACA [2] is also a wearable-assisted continuous authentication 
system, which is based on sensor-based keystroke dynamics. WACA operates by deriv- 
ing users’ keystroke dynamics profile via the built-in sensors of a wrist-wearable, and 
periodically and transparently comparing the derived keystroke dynamics with the reg- 
istered profile of the initially logged-in user. Rigorous future research would be needed 
to explore whether WACA is vulnerable to VibRaze. 

Extensive research literature has shown the feasibility of the ghost-and-leech relay 
attacks in various short-range wireless channels. For instance, Francillon et al. [7] and 
Kfir et al. [14] have shown how a relay attack can be mounted against the system using 
RFID communication channel. Specifically, the relay attack in [7] is mounted against 
Passive Keyless Entry and Start (PKES) system used in modern cars to open and start 
the cars. In [14], the relay attack is executed against contactless smartcard system that 
provides low cost “no-touch” authentication. Francis et al. [8] have demonstrated the 
relay attacks against the system that uses NFC communication. Further, with the aim of 
impersonation, Levi et al. [18] have shown a relay attack on Bluetooth authentication 
protocol. 


9 Conclusion and Future Work 


As arepresentative instance of zero-effort deauthentication, ZEBRA is an appealing and 
useful proposition. In this paper, we presented VibRaze, a new attack vector against 
ZEBRA, that can compromise the security of the system even when the attacker is 
located remotely, far away from the victim user. VibRaze comprises launching a ghost- 
and-leech relay attack and creating vibrations (e.g., through a phone call or notifica- 
tion) on the wearable device located remotely. We evaluated this attack system against 
ZEBRA considering various real-life potential activities that the user may perform at 
a remote location and demonstrated that ZEBRA is highly susceptible to the attack 
(succeeding with a 100% success probability in most of the cases). Our work shows 
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how vibration, a seemingly innocuous and highly ubiquitous means of device-to-user 
interaction, can be exploited for offensive purposes. Although this remote attack is fun- 
damental in nature and challenging to address, we presented some potential mitigation 
approaches that may be used to alleviate the threat of the exposed vulnerability. 

Since the first step of VibRaze, i.e., mounting the relay attack, can be easily and 
stealthy performed, in this study, we mainly focused on the second step of VibRaze, 
i.e., creating remote vibrations on the user’s watch. Further, we evaluated VibRaze by 
closely simulating Call-Vib and Notif-Vib via a programmatic method, rather than lever- 
aging actual vibration when call and notification are received. Rigorous future work 
would be needed to have an end-to-end implementation of VibRaze with actual in- 
device vibration and to evaluate the performance of such an implementation against 
ZEBRA. Since the success of VibRaze relies on the duration of vibration on the vic- 
tim’s watch, future work is needed to quantify how often (or long) and what kind of 
calls/notifications (e.g., from known vs. unknown caller) that the users do not pay atten- 
tion and leave their devices vibrating. 
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Abstract. Zigbee is an energy-efficient wireless IoT protocol that is 
increasingly being deployed in smart home settings. In this work, we ana- 
lyze the privacy guarantees of Zigbee protocol. Specifically, we present 
ZLeaks, a tool that passively identifies in-home devices or events from the 
encrypted Zigbee traffic by 1) inferring a single application layer (APL) 
command in the event’s traffic, and 2) exploiting the device’s periodic 
reporting pattern and interval. This enables an attacker to infer user’s 
habits or determine if the smart home is vulnerable to unauthorized 
entry. We evaluated ZLeaks’ efficacy on 19 unique Zigbee devices across 
several categories and 5 popular smart hubs in three different scenarios; 
controlled RF shield, living smart-home IoT lab, and third-party Zigbee 
captures. We were able to i) identify unknown events and devices (with- 
out a-priori device signatures) using command inference approach with 
83.6% accuracy, ii) automatically extract device’s reporting signatures, 
iii) determine known devices using the reporting signatures with 99.8% 
accuracy, and iv) identify APL commands in a public capture with 91.2% 
accuracy. In short, we highlight the trade-off between designing a low- 
power, low-cost wireless network and achieving privacy guarantees. We 
have also released ZLeaks tool for the benefit of the research community. 


Keywords: Zigbee - IoT - Device identification - Passive inference 


1 Introduction 


Smart home products (e.g., bulbs, outlets, sensors, etc.) allow users to control 
and monitor their smart home’s environment wirelessly, but unfortunately, pose 
a significant risk to users’ privacy. Prior studies have demonstrated that by 
intercepting the IP traffic of a smart home, the attacker can determine in-home 
devices [1-3], events [4,5], and user’s habits [6]. In practice, these attacks are 
difficult to carry out, as the attacker must find a vulnerability to capture the 
user’s IP network traffic (e.g., by gaining root access to the home router). Yet, 
there exists an easy privacy violation attack, i.e., simply sniffing the Internet of 
Things (IoT) wireless protocol (e.g., Zigbee) transmissions that are unintention- 
ally emitted to up to hundreds of feet. Although the IoT traffic is encrypted to 
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prevent eavesdropping, researchers recently showed that the attacker can still 
identify events using a-priori device signatures [7,8] and infer a few encrypted 
Zigbee (Network layer) commands by exploiting the payload lengths [9]. 

In this work, we analyze the privacy guarantees of one of the most popular IoT 
wireless protocols, Zigbee [10], that is increasingly being used in smart hubs such 
as Amazon Echo Plus, Samsung SmartThings, and Philips Hue. With the launch 
of more than 500 new Zigbee-certified devices in 2020 alone and the expected 
sale of nearly four billion Zigbee chipsets by 2023 [11], Zigbee continues to be 
the preferred choice of device manufacturers. 

Our key insight is that design optimizations incorporated into Zigbee to 
enable low-latency communication on low-cost resource-constrained devices fun- 
damentally leak information, e.g., to keep the frame length small, Zigbee per- 
forms encryption transformation [10] on AES encrypted output to match the 
message length. This enables an eavesdropper to exploit unpadded payload 
lengths and discrepancies in traffic metadata to infer every encrypted network 
layer (NWK) and application layer (APL) command. Moreover, to prevent device 
timeout, Zigbee devices periodically report attributes like battery level, temper- 
ature, etc., to the smart hub. The distinct reporting patterns and intervals inad- 
vertently serve as device fingerprints. In this work, we exploit device’s unique 
reporting patterns and the possibility of inferring APL commands to passively 
determine devices and events in the target network. Specifically, we make fol- 
lowing contributions. 


Device and Event Identification Using Inferred APL Command: We 
demonstrate that the event traffic of a device always includes at least one 
functionality-specific APL command (such as Door Lock/Unlock), which alone 
specifies the triggered event (i.e., lock/unlock) and the functional device type 
(i.e., door lock). Zigbee Cluster Library (ZCL) specification [12] inherently leaks 
information about all such APL commands. We attempt to infer a single func- 
tionality specific APL command in the encrypted event traffic to determine event 
and device type and combine manufacturer’s identity obtained from the Orga- 
nizationally Unique Identifier (OUI) of the device’s MAC address to identify a 
particular Zigbee device. Unlike prior works [7,8], this approach does not require 
device’s event signatures and can even identify unknown events and devices’. 
In practice, inferring functionality-specific APL commands is extremely chal- 
lenging, and so far, no study has attempted it. This is because the metadata of 
functionality-specific APL commands is immensely similar to a hundred other 
generic APL commands. Few APL commands are also manufacturer config- 
urable, which prevent us from exploiting only the payload length, packet direc- 
tion, and radius (hops) to infer APL commands using prior NWK command 
inference approach [9]. We utilize frame format guidelines [12] to identify all pos- 
sible APL commands with payload lengths overlapping with the functionality- 
specific APL commands and their response commands (if any), e.g., door unlock 
request and response. The discrepancies in the traffic’s metadata, together with 
the device’s logical type (electricity-powered or battery-powered), are used to 
construct inference rules for each target functionality-specific APL command. 


1 Zigbee Devices not previously observed, i.e., no a-priori access to their traffic. 
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Device Identification Using Periodic Reporting Patterns: Zigbee devices 
periodically report attributes to the smart hub. We exploit reporting patterns 
and intervals to create unique device fingerprints. This approach is useful for 
identifying a known device with unpatched vulnerability (e.g., to spread mal- 
ware) in the Zigbee network, which has minimal user activity. Unlike prior 
works [7,8] that analyze Zigbee traffic generated due to event occurrence only: 
this approach can identify devices even when no event is triggered. Given that 
every device’s current consumption varies based on its communication pattern 
and hardware, the periodic reporting time is not trivial to modify as it directly 
impacts device certification requirement of minimum 2-years battery life [13]. 


Automating Event and Device Identification with ZLeaks Tool: We 
developed a comprehensive privacy analysis tool for Zigbee protocol, named 
ZLeaks [14], that automates the aforementioned identification techniques. 
ZLeaks takes the Zigbee traffic as input and passively determines events and 
devices in the smart home. It can also extract devices’ reporting signatures 
automatically. 

We experimentally evaluated ZLeaks on by far the most extensive device 
set used in privacy analysis of Zigbee protocol including 5 popular smart hubs 
(SmartThings, Amazon Echo Plus, Philips Hue, OSRAM Lightify, and Sengled) 
and 27 commercial off-the-shelf Zigbee devices, out of which 19 devices were 
unique. The experiments were performed in 1) an isolated RF shield and 2) a liv- 
ing smart-home “Mon(IoT)r Lab” [15] with multiple IoT and non-IoT networks 
operating simultaneously. Furthermore, we validated the findings on third-party 
capture files available on Wireshark [16] and Crawdad [17] forums. Our results 
indicate that ZLeaks identified event and device information using inferred APL 
commands with 83.6% accuracy and devices using reporting patterns with 99.8% 
accuracy. Also, we inferred functionality-specific APL commands in a public Zig- 
bee capture, using our command inference rules, with 91.2% accuracy. 


2 Background and Motivation 


2.1 Zigbee Overview 


Zigbee is one of the most popular low-cost, low-power, wireless protocols specif- 
ically designed for battery-powered applications in smart ecosystems such as 
smart homes and industries. Zigbee is built on top of the low data-rate IEEE 
802.15.4 wireless personal area networking (PAN) standard and implements the 
physical (PHY) and medium access control (MAC) layers as defined by the IEEE 
standard. Most commercial Zigbee devices operate at a data rate of 250 kbps in 
the 2.4GHz band (divided into 16 channels, each 5 MHz apart). Some Zigbee 
devices also operate in the unlicensed frequency bands of 784, 868, and 915 MHz. 


Network Architecture: Zigbee supports both centralized and distributed net- 
work architectures. Centralized networks comprise of three logical device types; 
Zigbee coordinator (ZC), Zigbee router (ZR), and Zigbee end-device (ZED), 
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Fig. 1. Zigbee’s Protocol Stack comprising of PHY, MAC, NWK and APL layers 


while the distributed networks have ZR and ZED only. ZEDs do not route traffic 
and may sleep to conserve battery, making them appropriate for battery-powered 
devices (e.g., sensors, door locks). ZRs are responsible for routing traffic between 
nodes and storing messages intended for ZEDs until they are requested. Every 
Zigbee network has one ZC that is responsible for network formation, issuing 
network identifiers, and logical network addresses. ZC also acts as a trust center 
to authenticate new nodes and distribute keys. ZRs and ZCs are powered devices 
(e.g., bulbs, smart hubs) and do not sleep during the network’s lifetime. Besides, 
Zigbee supports connectivity in star, mesh, and tree topologies. Zigbee does not 
implement MAC address randomization. Each Zigbee node has a manufacturer- 
assigned 64-bit MAC (extended) address that is mapped to a unique 16-bit 
network (logical) address by the ZC during device pairing. The logical address 
is used for routing, while the extended address is used for authentication. 


Zigbee Protocol Stack (Fig. 1): Zigbee standard [10] defines the functional- 
ities of the Network and Application layers. The Network layer is responsible 
for network formation and management, routing and address allocation. There 
are 12 NWK commands, such as Link Status, Route Record, Route Reply, etc. 
Zigbee’s Application layer comprises of Application Support (APS) sublayer, 
Zigbee Device Object (ZDO), and Application Framework. APS sublayer main- 
tains binding tables and address mappings, and ZDO implements the device 
in one of the three logical roles (ZC, ZR, or ZED). The application framework 
offers pre-defined profiles (e.g., home automation, health care, etc.) and func- 
tional domains called clusters (e.g., lighting, security, etc.) for end-manufacturers 
to support device interoperability. Broadly, APL commands are either function- 
ality specific or generic (such as Read Attributes, Report Attributes etc.). 


Security and Privacy: Zigbee uses 128-bit AES encryption to provide payload 
confidentiality and message authentication. The standard also has the provision 
for integrity-protection using 128-bit AES CCM* block cipher and replay protec- 
tion using a 32-bit frame counter. Each Zigbee device has a pre-installed global 
trust center link key, which is used if the manufacturer does not provide any 
unique link key or QR install code. The Network (encryption) key is randomly 
generated by ZC during network formation and is common to all Zigbee nodes. 
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Fig. 2. Communication Flow in a Zigbee Home Network. 


2.2 System and Threat Model 


We assume a Zigbee home network, similar to Fig. 2, where a smart hub (ZC) 
is paired with several popular Zigbee devices (ZRs and ZEDs). The hub is con- 
nected to an IP gateway to update devices’ states on the cloud and the user’s 
smart app. The smart home’s occupants carry out routine activities and can 
control devices via the smart app from virtually anywhere. We assume that a 
passive attacker is collecting Zigbee transmissions using a wireless Zigbee sniffer 
from within the wireless communication range of the victim network. We use TI 
CC2531 Zigbee sniffer [18], equipped with the standard omnidirectional antenna, 
to receive Zigbee transmissions at a distance of 20 m°. The attacker does not need 
access to the smart app or physical presence inside the smart home; he can even 
implant a Zigbee sniffer nearby and observe the traffic remotely. 

The attacker analyses the captured Zigbee traffic to passively identify the 
events and devices using either command inference or periodic reporting pat- 
terns, irrespective of the network or link keys, device’s QR code, or specific events 
like device pairing or rejoining, which aid the attacker to extract the Network 
key. In other words, we assume a fully operational Zigbee network with subject 
devices (door locks, bulbs, outlets, and various types of sensors) configured and 
commissioned a-priori. The attacker only requires some background knowledge of 
the Zigbee standard. There is no need to collect event signatures for each device. 
Only when a specific device is required to be identified in the target smart home 
with zero user activity, the attacker needs the device’s reporting signatures. Note 
that Zigbee packets are exchanged between the hub and end-devices only; so even 
having access to the user’s smart app and reverse engineering it would not leak 
information regarding the Zigbee commands. 


Challenges: The AES-128 algorithm used by Zigbee has proven confusion and 
diffusion properties and prevents eavesdropping. The attacker can resort to using 
the existing NWK command inference scheme [9] based on payload size, radius, 
and actively determined logical device type to infer NWK frames. Unfortunately, 
the events and device information is embedded in APL commands where the 
radius is insignificant. Also, unlike the 12 NWK commands, which have defined 
payload lengths [10], there are more than a hundred APL commands, most of 


? Range can be extended with a high gain directional antenna. 
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Fig. 3. Inference Strategy: If event occurs, infer functionality specific APL command 
and combine MAC identifier to identify the device and event. If there’s no event, 
identify device using periodic signature correlation. If it fails, wait for an event. 


which are manufacturer configurable (e.g., Report Attributes, Read Attributes, 
etc.). Hence there exist several overlappings at each payload length. These factors 
make the existing approach [9] insufficient to passively infer APL commands. 
The unencrypted IEEE 802.15.4 frames in the Zigbee traffic also provide 
negligible information regarding the devices and events, e.g., the frequently 
exchanged IEEE 802.15.4 ACK does not mention network or MAC address for 
the source or destination, and the incremental frame sequence numbers roll back 
after 256, making it extremely challenging to trace the communicating nodes. 
Moreover, existing research studies rely on a-priori event signatures for the 
identification of events [7,8]. In practice, user events are infrequent, e.g., during 
nighttime. In this idle state, the devices and hub exchange periodic reports only 
and do not leak any device information. Hence, identifying a device without event 
signatures or in the absence of events are still open problems for the attacker. 


3 Passive Inference Attacks on Zigbee 


3.1 Attack Overview 


As illustrated in Fig.3, our fundamental goal is to invade the smart home’s 
privacy by determining Zigbee devices, triggered events, and encrypted com- 
mands exchanged in the home. We use a low-cost wireless Zigbee receiver, TI- 
CC2531 [18], to identify and tune to the target network’s communication fre- 
quency channel and sniff the Zigbee traffic. To maximize the amount of infor- 
mation extracted from the sniffed traffic, we first perform network mapping, 
whereby the logical device type of each node (ZC, ZR, or ZED) is determined. 

If an event occurs, we use proposed inference rules (Sect.3.3) to identify 
the functionality-specific APL command in the event’s traffic, which further 
reveals event and device type. The manufacturer is revealed from the device’s 
MAC identifier. Specifically, we exploit the device’s logical type and metadata 
variations in APL commands, that stem from power consumption optimizations 
incorporated into Zigbee. Unlike prior works [7,8], we do not require a-priori 
event signatures for every device and can infer unknown devices and events. 

In addition, we leverage the device’s reporting pattern and interval to cre- 
ate unique reporting signatures (Sect.3.4). Whenever a known device with 
unpatched vulnerability needs to be identified in the target network with no 


ZLeaks: Passive Inference Attacks on Zigbee Based Smart Homes 111 


event triggers, we correlate the device’s reporting signature with the reporting 
pattern and interval of every device in the target’s Zigbee traffic. If the reporting 
signatures are unavailable, we wait for an event to identify the device using com- 
mand inference. To the best of our knowledge, no prior work has demonstrated 
device identification, using APL commands, without collecting event signatures 
or through periodic reporting patterns. Below we explain the attack phases. 


3.2 Passive Network Mapping 


To keep the frame length small, Zigbee uses logical address for routing, the 
source’s MAC address for authentication, and excludes the destination’s MAC 
address. Thus, to identify the type and model of the target device, it is essential 
to keep a mapping of logical address, MAC address, and logical type (ZC, ZED, 
or ZR) for each logical address (i.e., node) in the traffic. Zigbee specification [10] 
identifies ZC as the node having 0x0000 logical address. We observed that for 
IEEE 802.15.4 Data Requests specifically, the source node is ZED and the des- 
tination node (other than 0x0000) is ZR. In addition, we recognized ZRs as the 
destination node of any Zigbee frame that has source routing information in the 
metadata, and that node does not send IEEE 802.15.4 Data Requests. ZR can 
also be identified as the source of NWK commands namely Link Status, Rejoin 
Response, and Network Report, provided the node address is not 0x0000 [9]. 


3.3 Device and Event Identification Using Inferred APL Command 


Although devices exhibit unique event patterns, the event traffic of same func- 
tional devices always includes same functionality-specific APL command, e.g., 
bulbs use color control command for color change. It happens because device 
manufacturers use defined Zigbee clusters to support vendor interoperability. 
This is validated from the official Zigbee compliance documents, e.g., Light- 
ify [19] and Sengled [20] bulbs use same APL commands. Below we describe our 
scheme to devise and use command inference rules to identify events and devices. 


Inference Algorithm: The functionality-specific APL commands of interest 
(OnOff, Color Control, Level Control, Lock/Unlock, and Zone Status (short for 
IAS Zone Status Change) have fixed payload lengths. However, there exist over- 
lappings with several generic APL commands within the encrypted traffic. This 
happens because there are more than hundred APL commands, many of which 
are manufacturer configurable and only have minimal payload and attribute size 
specified in the standard [12]. Thus, command xyz with a minimum 10-byte 
payload and 3-byte attribute size has a payload subset of 10, 13, 16 bytes etc. 
As shown in Fig. 4, to devise inference rule for a functionality-specific APL 
command, we utilize APL frame formats [12] to first identify all APL commands 
that have overlapping payload lengths and packet direction with the target com- 
mand and its response command (if any), e.g., Door Lock/unlock request and 
response. Next, a test event is triggered, and overlapping commands are differ- 
entiated based on the logical device type and metadata variations (e.g., network 


112 N. Shafqat et al. 


Devising APL command Inference rules Analyzing encrypted Zigbee capture 


Select a functionality specific APL eer | Filter APL commands sent/ received by each device | 


APL commands:- src, dst, payload 
lengths, response command 


find E 
APL commands 


Use inference rules to identify functionality specific 


APL command, event type and device type 


J 


Leverage metadata + device functionality to devise rule 


i 


Use ZCL spec + command frequency to infer event/ device 


( Check if MAC OUI is real 


Device E event Device Gee and 
Core. event Gee 


Fig. 4. Strategy to identify devices and events from inferred APL commands. 


Table 1. Identifying devices and events from Inferred commands. (Resp = Response, 


D=ZED/ZR, ND=NWK discovery, * = burst repeats, ** = broadcast, (x) = payload 
len) 

Target Inference Rule Command Device Type Event 
ZC-ZED(11) | Resp = (12 || 21) Lock/unlock Door lock Lock/unlock 
ZC - D(11) | Resp=13]|15 != 12** | OnOff Outlet/ bulb | On/ off 

ND=1 != 11 
ZC - D(14) Rep Level Control Level changed 
Prec != 17** Smart Bulb 
ZC - D(15) | ND = 1 Resp != 12 | Color Control Color Changed 
Zone Status (1*) | Motion Sensor Motion 
ZED-ZC(17) Preceding Packet Zone Status (1) | Door Sensor | Open/ close 
(Prec) != 13 Zone Status (2) | Flood sensor | Water leakage 
Zone Status (3) | Audio sensor | Audio detected 


discovery, end device initiator, etc.). As seen in Table 1, a command inference 
rule specifies properties of APL commands that must be present in the event 
burst (traffic). For instance, an APL command of payload length 11 bytes, sent 
from ZC to ZED is Lock/unlock command if the response packet (ZED to ZC) 
is 12 or 21 bytes. Since same functional devices use same functionality-specific 
commands, the inference rules constructed for a certain device also hold true 
for other manufacturers’ devices. We stress-checked the rules against 200 MBs 
of Zigbee capture from our devices and third-party sources [17,21]. Note that 
most APL commands (like color control), directly reflect the event and device 
type. However, for outlets and bulbs, that use same OnOff command, the device 
type is indistinguishable until an additional event, e.g., color change is triggered. 
For Zone Status command, we observed behavioral consistencies that allowed us 
to differentiate various types of sensors; e.g., Zone Status appears twice in the 
event burst for flood sensor and thrice for the audio sensor. For motion and door 
sensor, Zone Status appears once only. However, we noticed that for motion 
sensors, the same burst pattern repeats after few seconds. 


Identifying Events and Devices: We first filter all APL commands in the 
event’s traffic sent to or received by the target logical address (e.g., Oxabcd) 
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Fig. 5. Strategy to identify devices using periodic reporting patterns. 


and discard any duplicate packet. We observed that the functionality-specific 
command is generally the first APL command of the event burst; hence we also 
discard bursts that do not have any frame with target payload lengths (11- 
17 bytes) in the initial half of the burst. If a command with target payload 
length exists, we use Table 1 to identify the APL command, event, and device 
type. Finally, we combine the manufacturer’s identity extracted from the device’s 
MAC OUI (e.g., PhilipL) to identify the device. Note that the exact device’s 
identification depends on the MAC OUI showing real manufacturer, rather than 
system-on-chip (SOC) manufacturer, e.g., SiliconL. In essence, we can passively 
identify unknown events and devices from the target functional domains (bulbs, 
outlets, door locks, and sensors) without the Network key or event signatures. 


3.4 Device Identification Using Periodic Reporting Patterns 


Zigbee devices periodically report their status (battery level, firmware upgrades, 
etc.) to ZC. Since every functional device has varied power consumption, the man- 
ufacturers manipulate periodic reporting frequency, and specific frame attributes 
to comply with the Zigbee certification requirement of minimum 2 years battery 
life [13]. The discrepancies in reporting patterns and intervals allow us to devise 
unique device fingerprints and identify devices even when no event occurs (e.g., 
during office hours). Unlike event bursts, reporting bursts have no functionality- 
specific APL command and do not directly reveal device’s identity. 


Devising Periodic Reporting Signatures: As shown in Fig.5, to devise 
a device’s periodic reporting signature, we put the device in the idle state and 
filter all APL commands exchanged between the device and ZC. After discarding 
duplicate packets, we analyze the traffic to determine at least three bursts with 
same reporting pattern and interval. Thus, the signature sign; is a sequence of 
APL frames f; defined using logical device type of source (src) and destination 
(dst), payload length (pl), and reporting interval (RI) and is represented as: 


signi = {fi, fo, f,..} where fi = {srce;,dst;, pli, RIi} (1) 
Identifying Devices: We first filter APL commands from the traffic and dis- 


card duplicate packets or bursts with any functionality-specific command. Next, 
we look for two similar bursts and correlate the observed pattern and interval 
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with the available signature set to identify the device. Rarely, but if two signa- 
ture sets collide, we use additional attributes like MAC OUI to make a decision. 
If no reporting signatures are available for a device, we wait for an event burst 
to identify the device using command inference approach (Sect. 3.3). 


4 Experimental Setup and Results 


4.1 Automating Passive Inference Attacks with ZLeaks Tool 


To automate the inference attacks depicted in Fig. 3, we developed a command- 
line tool in Python, named ZLeaks. ZLeaks takes Zigbee PCAP capture as input 
and determines the event occurrences and devices in the network. While in the 
vicinity of the target network, the attacker can run ZLeaks on his laptop or 
embedded board like Raspberry Pi with a single command. ZLeaks extracts 
all APL commands from the captured traffic and uses Pyshark library [22] to 
parse required frame attributes (e.g., payload length, logical types of nodes, etc.) 
in a temporary CSV file for analysis. ZLeaks then attempts to identify events 
and devices using either proposed APL inference rules (Sect.3.3) or available 
reporting signatures (Sect. 3.4). Note that the attacker can automatically extract 
reporting signatures of an idle Zigbee device using ZLeaks Signature Extractor. 


4.2 Experimental Setup 


Our device set comprised of 27 commercial off-the-shelf Zigbee devices (ranging 
from bulbs, locks, outlets, to various sensors) that were selected based on Ama- 
zon’s popularity and manufacturer diversity. Amongst 27 devices, 19 devices 
were unique, while a few non-unique devices were purchased from a different 
source and tested to ensure that the evaluation results for a particular device 
and model remain consistent. Furthermore, while we used 11 unique devices to 
formulate inference strategy, we set aside 8 unique devices, at least one from 
each functional domain as the unknown devices for the sole purpose of evalu- 
ation. The known and unknown devices are listed for reference in Table 3 and 
Table 4 respectively. The tests were conducted with 2 universal (manufacturer- 
independent) hubs; SmartThings and Amazon Echo Plus and 3 vendor-specific 
hubs; Philips Hue Bridge 2.1, Sengled Z02-hub, and Lightify Gateway. This is 
by far the most extensive Zigbee device set used to evaluate Zigbee protocol. 
We evaluated ZLeaks identification techniques in following three settings; 


RF Shield: It was used to i) study devices’ response to event triggers while 
devising command inference rules, ii) collect the device’s reporting pattern, and 
iii) perform a controlled evaluation of ZLeaks by simultaneously pairing multiple 
devices with each hub. As depicted in Fig. 6; the RF shield was connected to the 
gateway to provide continued Internet access to ZC placed inside the shield. To 
sniff the Zigbee communication, a standard omnidirectional antenna (inside the 
shield) was connected via an SMA cable to a low-cost TI CC2531 wireless Zigbee 
sniffer [18] plugged into the laptop (outside the shield). 
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Fig. 6. Experimental Setup for analyzing Zigbee Devices: SMA cable connects sniffer’s 
antenna (inside RF shield) with TI CC2531 sniffer connected to the laptop outside. 


IoT “Living Lab”: It is a realistic noisy IoT lab named “Mon(IoT)r Lab” at 
Northeastern University [15], which has more than 100 smart devices already 
connected over several wireless networks, along with various non-IoT networks. 


Public Captures: We used Zigbee captures from; i) Wireshark forum [16], and 
ii) Prior captures [9] available on Crawdad [17] to show that ZLeaks is inde- 
pendent of evaluation testbed, device set, and works for unknown devices. We 
verified the results using the Network keys available with the capture files. Both 
the captures contained only event bursts and did not include enough reporting 
patterns to evaluate the periodic reporting approach. 


4.3 Evaluation Metrics 


We evaluated ZLeaks using three parameters; 1) Inferred APL commands, 2) 
Event and device type extracted from APL command, and 3) Correlated periodic 
reporting patterns. We used traditional accuracy metrics to evaluate parameters 
1 and 3. As a particular inferred APL command always yields same results for 
event and device, we evaluated parameter 2 using proposed Device Score scheme. 


Traditional Metrics: We use True Positive Rate (TPR) and False Negative 
Rate (FNR) to specify the rate of correct and missed (or out-of-order) obser- 
vations, respectively. As evaluation results indicate, there are no False Positives 
(FP) or True Negative (TN) outcomes, hence, we calculate accuracy, i.e., the 
ratio of correctly inferred observations to the total number of observations, as: 


TP 


TPR (recall) = TP + FN’ (2) 
FN 
PNR = Tp EW A 
TP+TN 


Accuracy = 


TP + TN + FP + FN’ 


Score (Short for Device Score): It determines the amount of device and 
event information extracted from the inferred APL command and device OUI. 
We calculate Score as a sum of device type (DT), event type (ET) and manu- 
facturer’s identity (M), with weights of each attribute defined in Table 2. 


Score = M + DT + ET (5) 
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Table 2. Score Table for Evaluating Command Inference Approach 


Attributes Score Example 
Manufacturer (M) | 0 = SOC OUI SiliconL, Ember, TexasIns, NordiacSE .. 
1 = Real MAC OUI PhilipsL, OSRAM, SmartThi, Zhejiang .. 


Device Type (DT) | 0 = Unidentified - 
1 = Uncertain door lock or bulb (different commands) 
1.5 = Indistinct either outlet or bulb? (same command) 
2 = Identified Outlet, door lock, motion sensor, bulb .. 
Event Type (ET) | 0 = Unidentified - 
1 = Uncertain lock/unlock or on/off (different commands) 
1.5 = Indistinct either door lock or unlock? (same commands) 
2 = Identified motion detected, color change, etc .. 


To understand Score, consider switching on a bulb that triggers a functionality- 
specific APL command from ZC to ZED of payload size 11 bytes. The highest 
Score is 5 when all attributes are correctly inferred, and lowest is 0 when noth- 
ing is inferred. As per Table 1, the command is either Lock/unlock or On/off. From 
Table 2, DT and ET are 1 if these two commands are indistinguishable. For On/off 
command, DT (bulbs or outlet) and ET (on or off) are 1.5, whereas for Lock/unlock 
command, DT is 2 (lock) while ET is 1.5 (lock or unlock). 


4.4 Device and Event Identification Using Inferred APL Command 


Controlled Evaluation in RF Shield: We simultaneously paired all compat- 
ible devices with one hub at a time inside the RF shield and generated events 
randomly. From the sniffed traffic, ZLeaks inferred functionality-specific APL 
commands and MAC OUI for each device to determine triggered events and 
devices. Since the inferred APL command and MAC OUI remain same for a par- 
ticular device-event pair (e.g., color change for Sengled bulb), the Score remains 
same for every event prompt irrespective of the hub. Therefore, Table 3 reports 
findings of each device once. We see that distinct events like color change, motion 
detected, etc., are easy to infer than binary events (e.g., on/off). Philips bulb is an 
exception here as it uses distinct commands to represent on and off events. Fur- 
thermore, we identified various sensors from a single Zone Status command based 
on behavioral consistencies (refer to rules in Table 1). To conclude, the Score is 
dependent on the correct identification of the APL command and the MAC 
OUI showing the real manufacturer e.g., PhilipsL (Philips), SmartThi/Samjin 
(SmartThings), Ledvance (OSRAM), Zhejiang (Sengled), Jennic (Aqara), etc. 
ZLeaks identified all devices with an average Score of 4.3 out of 5 (ie., 86.3% 
information was successfully extracted). 


Realistic Evaluation in an IoT “Living Lab”: Next, we shifted all these 
devices, hubs, and 8 unknown (unseen) devices to the IoT lab. Again, we simul- 
taneously paired all devices to one hub at a time, generated random events, 
and analyzed the traffic with ZLeaks. Despite the noisy environment, the known 
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Table 3. Controlled Evaluation: Identifying Devices and Events using Inferred APL 
Commands. Here, SMT= SmartThings, M = Manufacturer, DT = Device Type, ET = 
Event type, and *=burst repeats after few seconds 


Device (Model) Event OUI Command (#)|M DT ET |Score 
off PhilipsL | Off with effect Lj 2 2 5 
Philips Hue Color Bulb |On PhilipsL | On/off: On 1| 2 2 5 
(LCA-003) Color change | PhilipsL | Color Control L 2 -2 5 
Dim PhilipsL | Level Control Wel, 2 2 5 
Sengled Color Bulb Color change | Zhejiang | Color Control 1) 2 2 5 
(E11-N1BA) Dim Zhejiang | Level Control i 2 2 5 
On/off Zhejiang | On/Off 1/15/15] 4 
Sengled White Bulb G14 On/Off Zhejiang | On/Off 1/ 1.5/1.5 4 
Centralite Outlet (Mini) On/Off siliconL | On/Off 0)1.5/1.5|] 3 
Sonoff Outlet (S31 Lite) On/Off texasIns |On/Off 0)1.5/1.5| 3 
SMT Outlet (US-2) On/Off Smartthi| On/Off 1/15/15] 4 
SMT Motion sensor (IM) | Motion Smartthi| Zone Status (1*)} 1] 2 | 2 5 
SMT Multisensor (250) (Open/close |samjin | Zone Status (1) | 1 | 2 |1.5| 4.5 
Ecolink Water Sensor water leak | ember Zone Status (2) | 0} 2 | 2 4 
Ecolink Sound Sensor Sound ember | Zone Status (3) |0| 2 | 2 
Yale Door lock (YRD226) | Lock/unlock |ember | Lock/Unlock 0| 2 [1.5] 3.5 


Table 4. Realistic Evaluation: Identifying Unknown Devices and Events using Inferred 
Commands. (M= Manufacturer, DT = Device Type, ET = Event type, * = repeats) 


Device (Model) Event OUI Command (#)|M)/DT|/ET Score 
Philips White Bulb ot Phiips C ee tnay | 
On On/off: On 1| 2 |2 5 
OSRAM Color Bulb On/off ledvance | On/Off 1} 2]1 4 
(Sylvania Smart+) Color change | ledvance | Color Control 1| 2 2 5 
Dim ledvance | Level Control 1| 2 2 5 
SmartThings (SMT) Bulb |On/off SiliconL | On/Off 0/15/15 3 
Aqara Outlet (US) On/off jennic |On/Off 1/15/15) 4 
Ewelink Outlet (SA-003) On/off TexasIns | On/Off 0/15/15) 3 
SMT Motion Sensor IRM _ | Motion samjin | Zone Status (1*) | 1 2 5 
Visonic Door sensor MCT |Open/close |ember | Zone Status (1) | 0 1.5) 3.5 
Schlage Lock (Connect) Lock/unlock | siliconL | Lock/Unlock 0 1.5) 3.5 


devices exhibited the same Score as reported in Table 3. The experimental results 
for unknown devices are presented in Table 4. Unknown devices with real MAC 
OUI and distinct event types, e.g., color change for Sengled bulb, were accu- 
rately identified by ZLeaks. Overall, ZLeaks identified unknown devices with 
an average Score of 4.2 out of 5 (i-e., identified 83.6% devices and events). We 
conclude that despite devices exhibiting unique event signatures across different 
hubs, the functionality-specific APL command remains same and can be used to 
effectively identify any unknown device with a single event trigger. 
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Table 5. Public Evaluation: Identifying Unknown Devices and Events using Inferred 
Commands. (M= Manufacturer, DT = Device Type, ET = Event type, * = repeats) 


Source Unknown Device | MAC OUI Command (#) |M| DT | ET | Score 
Wireshark Motion Sensor 1 private Zone Status (1*) |0 | 2 2 4 
ZCL log [16] | Motion Sensor 2 none Zone Status (1*) |0 | 2 2 4 
Zigator [17] | SmartThings Outlet | samjin On/off 1 {1.5 |1.5 |4 
Sth2-duos (IM6001) 


Table 6. Evaluating APL Command Inference rules on Public Zigbee Capture [17]. 
Note: * implies that the command is identified, but not the state. 


APL Commands Total Packets Inferred Packets | Accuracy (%) 
Zone Status Change 2916 2712 93.0 
ZCL On || ZCL Off 2423 2175* 89.8 
Door lock || Unlock Request 676 596* 88.1 
Door lock || Unlock Response 403 370* 91.8 
Color Control, Level Control 0 0 0 


Open World Evaluation on Public Captures: We evaluated ZLeaks over 
public Zigbee captures and reported results in Table 5. In capture 1 [16], we found 
2 unknown devices that were recognized as motion sensors due to the presence of 
repetitive Zone Change commands. Capture 2 [17] had 1 unknown device which 
used On/Off for events. Note that we removed device commissioning traffic 
(including Network key) from both files to comply with our threat model. 

As device identification is dependent on the correct inference of functionality- 
specific APL commands, we also evaluated ZLeaks inference rules on capture 
2 [17]. The results in Table 6 indicate that ZLeaks inferred functionality-specific 
APL commands with 91.2% accuracy. We used our command inference strat- 
egy on generic APL commands and were able to infer Device Announcement, 
Bind Request and Response (RR), Link Quality RR, NWK Address RR, Parent 
Announcement RR, etc., with 100% accuracy. Most of all, the 6 NWK commands 
that Zigator [9] could not identify, were inferred with 85.7% accuracy. 


4.5 Device Identification Using Periodic Reporting Patterns 


Controlled Evaluation in RF Shield: We simultaneously paired all known 
devices with one hub at a time in an RF shield and left them in the idle state 
for at least 3h. This way, devices reporting the attributes every 5 or 10 min 
yielded 36 and 18 reporting patterns, respectively, which are sufficient to evaluate 
two main features; reproducibility and uniqueness of periodic signatures. Table 7 
summarizes the results of this experiment, with reporting intervals in second, 
minute, and hour represented using letters s, m, and h. Note that several devices 
exhibited more than one reporting pattern, e.g., for battery, temperature, etc., 
while few devices showed a different number of reporting patterns across different 
hubs, e.g., SMT and Sonoff outlet. This essentially helped identify both the 
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Table 7. Controlled Evaluation of Periodic Reporting scheme. Here, SMT= 
SmartThings, RI= Reporting interval, TPR= True positive rate, FNR= False nega- 
tive rate 


Device SMT v2 Hub Amazon Echo+ Philips/Sengled 
RI |TPR|FNR RI | TPR|FNR RI TPR|FNR 

Centralite Outlet 5,10m} 1 0 5, 9m 0.982|0.018 N/A 
Sonoff Outlet 5m 1 0 |5,10m) 1 0 N/A 
SMT Outlet 5, 10m} 1 0 10m 1 0 N/A 
Sengled White Bulb 5m 1 0 10m 1 0 (5,20,25m) 1 0 
Sengled Color Bulb 10m, 1h} 1 0 (10m,ih) 1 0 (5,20,25m) 1 
Philips Hue Color Bulb 1s, 2m 1 0 1s, 2m 1 0 1s, 2m 1 
SMT Motion sensor (IM); 5m 1 0 5m 1 0 N/A 
SMT Multisensor 5m, 1h |0.975)0.025) 5m, 1h 1 0 N/A 
Ecolink water sensor 30, 30m| 1 0 N/A N/A 
Ecolink sound sensor 27, 30m| 1 0 N/A N/A 
Yale Door Lock 1h 1 0 10m 1 0 N/A 


Table 8. Realistic Evaluation of Periodic Reporting scheme. (@ = successful device 
identification, © = success using additional info, and O = resembled other device) 


Device SMT v2 Hub | Amazon Echo+ Vendor Hub 

15m | 30m | 1h | 3h | 15m | 30m | 1h | 3h | 15m | 30m | 1h | 3h 

Centralite Outlet e © © oo © © 0 o 

Sonoff Outlet e © 0o © © 0 o 

SmartThings (SMT) Outlet! © © © © @ © © © 

Sengled (White) Bulb e © © o © © © o0o o © 0 o 

Sengled (Color) Bulb (3 © © © @ © © © @ © 0 © 

Philips Hue (Color) Bulb e © © oo o © © o0o o © 0 o 

SMT Motion sensor (IM) © © ooo © ọọ ooo 

SMT Multi sensor O O je o © o oo 

Ecolink water sensor © © © Not compatible 

Ecolink sound sensor © / © © Not compatible 

Yale Door Lock © © o © 0 o 


device and the smart hub from the encrypted traffic. It is evident from a high 
average TPR of 0.998 and low FNR of 0.002 that the periodic signatures were 
identifiable and consistent over time, except once when the Centralite outlet and 
SMT Multisensor showed two out-of-order packets and were not identified. 


Realistic Evaluation in IoT “Living Lab”: Next, we shifted all the hubs, 
and known and unknown devices to the Mon(IoT)r lab. We paired all com- 
patible devices to one target hub at a time and used the remaining devices as 
background Zigbee noise sources. The devices were left in the idle state for 3h, 
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and ZLeaks analyzed traffic after specific time intervals (15 min, 30 min, 1h, and 
3h). In Table 8, devices that were distinctively identified after the specified time 
are marked with a full circle, e.g., all outlets had reporting intervals of 5 and/or 
10 min; and were successfully identified within 15 min. Half-circle indicates iden- 
tical reporting pattern and interval for two devices, e.g., both Yale and Schlage 
lock (unknown device) reported the same pattern after 1h. In such a case, we 
used additional parameters (e.g., MAC OUI or logical device type) to identify 
the device. Finally, an empty circle depicts a complete resemblance between two 
devices, e.g., SMT motion sensor IRM (an unknown device) and SMT multisen- 
sor showed same patterns until the latter device reported its second pattern after 
Lh. In essence, it is quite concerning that the device and manufacturer identity 
is leaked even in the device’s idle state. Note that we could not evaluate this 
approach on public captures due to the absence of periodic reporting patterns. 


5 Discussion and Related Work 


5.1 Security Implications of Leaked Data 


The know-how of devices in the smart home and their states (e.g., door unlocked 
or bulb off) is crucial to the smart home’s security. A burglar can use this 
information to get insight into users’ affluence and determine when the house is 
vulnerable to intrusion. In addition, an attacker can use Common Vulnerabilities 
and Exposures (CVE) database [23] to find and exploit unpatched vulnerabilities 
in the identified devices. The vulnerable devices can be weaponized to spread 
malware to the network [24], create IoT botnets [25] or carry out denial of service 
attacks. The attacker may also use side-channel attack to hijack the vulnerable 
hub [26]. From a business perspective, the leaked information can help Zigbee 
manufacturers gain deep insights into users’ usage and activity patterns. This 
information can be sold to advertisers for interest-based advertisements, online 
tracking, or used in business decisions on future products. In short, our study 
provides deep insights into potential information leakages right at the source. 


5.2 Potential Countermeasures 


ZLeaks demonstrates the significance of unencrypted metadata (MAC OUI, 
frame, and payload lengths) in identifying functionality specific commands, 
events, and devices in the Zigbee network. Although exponential padding [27] 
effectively disguises payload lengths, it adds transmission overhead and increases 
power consumption for low-power Zigbee devices. We suggest padding random 
bytes in each payload (e.g., 0, 1, 2 or 3 bytes) and using the reserved field in 
the Zigbee security header to denote the number of padded bytes. This way, 
even same APL commands will have four different payload sizes, which will add 
enough entropy to make the Zigbee commands indistinguishable. Secondly, Zig- 
bee Alliance can mandate the use of chipset manufacturer’s identifier as MAC 
OUI to hide the real manufacturer’s identity. This alone reduces the average 
Score calculated for unknown devices using the APL inference approach from 
4.0 (80%) to 3.1 (62%). 
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Table 9. Privacy Implications of Leaked Information and Suggested Countermeasures. 


Info Privacy Implication Countermeasure to disguise info 
Device |- Hijack vulnerable devices/network Use SOC MAC OUI to hide vendor 
- Insight regarding user’s affluence Use random payload padding to 


P Ber complicate command inference 
- Online advertising 


Event |- Reveals user’s presence/absence daily | Attribute Pipelining OR similar 
routine request-response patterns to hide events 


The volumetric analysis also provides significant hints regarding the occur- 
rence of an event or periodic reporting. To make events indistinguishable, prior 
research [28,29] leverage mains-powered ZR or ZC to inject decoy packets in the 
traffic at pseudo-random intervals. However, decoy injector requires continuous 
training to avoid detection by the attacker. An efficient way to disguise events 
is to have similar event responses and reporting patterns for all devices. Alter- 
natively, all attributes can be pipelined in a single packet instead of a series of 
packets. The suggested countermeasures, as summarized in Table 9 require signif- 
icant design and implementation changes in the Zigbee protocol, as it is hard to 
prevent proposed inference attacks with a simple workaround like using a secure 
network or link key. We believe this is why the Zigbee Alliance is involved in 
new smart home technology, Matter [30], which has security as the fundamental 
design tenet and does not use Zigbee as the underlying IoT protocol. 


5.3 Related Work 


Privacy Analysis of Smart Home’s IP Traffic: Several research studies have 
analysed the encrypted IP network traffic of smart homes to predict devices’ 
events [4,5], user’s habits [6], device types [1—-3,31,32], and network anoma- 
lies [33,34]. Few studies [8,35] also analyzed the IP traffic between the smart 
app and cloud to detect misbehaving smart apps. Although these studies yield 
promising results, there are a few limitations; 1) attacker requires physical access 
to the network or mobile app, and 2) these approaches exploit traffic metadata 
(i.e., payload length, DNS responses, etc.); hence their effectiveness is question- 
able under realistic network conditions like Virtual Private Networks and Net- 
work Address and Port Translation enabled. Although recent studies have lever- 
aged packet-level signatures and temporal packet relations to identify events [5] 
and devices [36] despite traffic shaping in place, these machine learning (ML) 
approaches require re-training after firmware upgrades to extract new signatures. 


Privacy analysis of Zigbee (non-IP) Traffic: Unlike IP traffic patterns, Zig- 
bee traffic patterns are challenging to obfuscate using conventional traffic shap- 
ing, as it directly impacts power consumption and battery life. Still, very few 
studies [7,8,37-39] have analyzed Zigbee traffic with the intent to study infor- 
mation leakages right at the source. Zigator [9] exploited unencrypted attributes 
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Table 10. ZLeaks vs. existing Zigbee based schemes for identifying Event Type (ET), 
Device Type (DT), Device Identity (DI) and applicability on Unknown devices (UD) 


h i Identified 
Researc. Unique Technique (Feature) entifie 
Work Hubs | Devices ET | DT | DI| UD 
Peekaboo [7] 1 3 ML (traffic profiling) vw iw 
Z-IoT [37] 1 8 ML (inter-arrival-times) “V ~ 
IoTGaze [38] jl 5 ML (event pattern) wv 
IoTSpy [39] a 5 NLP (frame len (fl), direction) v 
Homonit [8] I 7 Levenshtein Distance (fl, direction) | W 
Command Inference (metadata) [|I 
ZLeaks 5 19 
Correlation (periodic reporting) [VI 


of Zigbee frames, notably packet length, directions, radius, and logical device 
type, to infer 6 out of 12 encrypted NWK commands. However, this inference 
approach does not apply to APL commands. Peekaboo [7] exploited traffic rate 
variations, and IoTSpy [39] leveraged packet sequence features to fingerprint 
known IoT events of merely 3 and 5 Zigbee devices, respectively. In addition, 
Homonit [8] and IoTGaze [38] analyzed Zigbee event patterns to detect malicious 
smart home apps. However, all these studies are confined to the identification of 
known events using a-priori event fingerprints. In contrast, ZLeaks infers event as 
well as device information without collecting event fingerprints for every device. 
Another study, Z-IoT [37] employed ML to identify device type by exploiting 
inter-arrival-time of NWK frames and IEEE 802.15.4 Data requests of the idle 
device. In contrast, ZLeaks exploits the device’s periodic reporting interval and 
pattern (based on APL commands) to identify the device type and the device 
with 99.8% accuracy. As evident from Table 10, our study was conducted on the 
largest device set spanning 5 hubs and 19 unique Zigbee devices. 


Security of Zigbee Protocol: Several attacks have been demonstrated against 
Zigbee protocol so far, such as selective jamming [9], worm chaining [24], com- 
mand injection [40], replaying [41], etc., with an aim to recover the Network 
key or make the target devices malfunction. Unlike ZLeaks, these attacks either 
rely on leaked global link key, install (QR) codes or require attacker’s presence 
during the device’s setup to identify key material. 


6 Conclusion 


This work highlighted that the power optimization-oriented design of Zigbee pro- 
tocol has destroyed the legal concept of privacy in smart homes. We presented 
ZLeaks [14], a privacy analysis tool that employs two inference techniques to 
demonstrate how easily a passive eavesdropper can determine in-home devices 
and events from the encrypted traffic, using a low-cost wireless Zigbee sniffer (TI 
CC2531). The evaluation conducted on an exhaustive set of 19 unique Zigbee 
devices and 5 smart hubs indicates that the ZLeaks command inference technique 
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identified unknown events and devices with 83.6% accuracy, without using event 
signatures. In addition, ZLeaks periodic reporting technique identified known 
devices in the absence of any user activity with 99.8% accuracy. Finally, we eval- 
uated our command inference rules on a third-party capture file and identified 
functionality-specific APL commands with 91.2% accuracy, irrespective of the 
secret keys. We conclude that the proposed inference attacks are impossible to 
prevent without making significant design changes in the Zigbee protocol. 
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Abstract. While storing documents on the cloud can be attractive, the 
question remains whether cloud providers can be trusted with storing pri- 
vate documents. Even if trusted, data breaches are ubiquitous. To pre- 
vent information leakage one can store documents encrypted. If encrypted 
under traditional schemes, one loses the ability to perform simple oper- 
ations over the documents, such as searching through them. Searchable 
encryption schemes were proposed allowing some search functionality 
while documents remain encrypted. Orthogonally, research is done to find 
attacks that exploit search and access pattern leakage that most efficient 
schemes have. One type of such an attack is the ability to recover plaintext 
queries. Passive query-recovery attacks on single-keyword search schemes 
have been proposed in literature, however, conjunctive keyword search has 
not been considered, although keyword searches with two or three key- 
words appear more frequently in online searches. 

We introduce a generic extension strategy for existing passive query- 
recovery attacks against single-keyword search schemes and explore its 
applicability for the attack presented by Damie et al. (USENIX Security 
’21). While the original attack achieves up to a recovery rate of 85% against 
single-keyword search schemes for an attacker without exact background 
knowledge, our experiments show that the generic extension to conjunctive 
queries comes with a significant performance decrease achieving recovery 
rates of at most 32%. Assuming a stronger attacker with partial knowledge 
of the indexed document set boosts the recovery rate to 85% for conjunc- 
tive keyword queries with two keywords and achieves similar recovery rates 
as previous attacks by Cash et al. (CCS 715) and Islam et al. (NDSS 712) 
in the same setting for single-keyword search schemes. 


Keywords: Searchable encryption - Conjunctive keyword search + 
Passive query-recovery attack 
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1 Introduction 


With increasing number of enterprises storing their documents in the cloud 
the question arises how to cope with storing sensitive documents on the cloud 
without the cloud provider learning information about the stored documents 
or information being leaked when a data breach occurs. One solution for this 
problem would be to encrypt the documents to hide its contents to the cloud 
provider. However, this prevents users from using the (often available) compu- 
tational resources cloud providers offer, since searching through the documents 
is no longer possible without first downloading and decrypting it. 

Searchable symmetric encryption schemes can be a solution to this problem 
that offer constructions for search functionalities over encrypted documents. The 
first practical solution towards searchable encryption has been proposed by Song 
et al. [22]. Proposers of searchable encryption schemes need to find a trade-off 
in efficiency, security, and functionality. With this trade-off in terms of secu- 
rity comes information leakage such as possible search pattern leakage (reveal- 
ing which queries concerned the same underlying, but unknown, keyword) and 
access pattern leakage (revealing the identifiers of all documents matching the 
search query). Most of the efficient searchable encryption schemes that allow for 
keyword search leak information in the access pattern for efficiency. 

Searchable encryption is an active line of research for finding efficient schemes 
that allow for search in encrypted documents with well-defined security in terms 
of a leakage function. Orthogonally, research is performed on finding attacks 
against proposed searchable encryption schemes. One such type of attack is a 
query-recovery attack, i.e. the ability for an adversary to recover the plaintexts 
from performed queries. In general two kinds of query-recovery attacks exist: (1) 
a passive attack where an adversary only has access to the information leaked 
by a scheme and (2) an active attack in which an adversary is able to inject 
tailored documents into the to-be-searched dataset. 

Active query-recovery attacks on conjunctive keyword search do exist [18, 
28] which are described as an extension on the proposed single-keyword search 
attack. Currently, all existing passive query-recovery attacks against searchable 
symmetric encryption that allow for keyword searches only focuses on single- 
keyword search schemes. However, these attacks do not reflect a realistic scenario, 
since single-keyword searches are limited and statistics show that the number of 
keywords used by people online in the US peaks at two keywords [5]. Also, three 
keyword searches are still more frequent than searches for a single keyword. The 
frequency of searches using seven or more keywords becomes negligible. 

Note that the recovery of conjunctive keyword queries is more difficult with 
respect to the recovery of single-keyword queries using similar vocabulary sizes. 
This difficulty stems from the fact that the space for keyword conjunctions is 
combinatorial in the number of conjunction terms compared to single-keywords, 
therefore an attacker needs to consider more possible candidates of keyword 
conjunctions for each observed query. 

In this work, we explore a passive query-recovery attack against secure con- 
junctive keyword search (CKWS) schemes. We propose a generic extension 
strategy for query-recovery attacks against single-keyword search to recover 
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conjunctive queries using the same attack. Our extension strategy is based on the 
use of trapdoors created from a keyword-conjunction set as a generalization of 
trapdoors created from single-keywords. Replacing keywords with keyword con- 
junction sets. Our attack is static and does also work on forward and backward 
private schemes [17]. 

We introduce an adaptation of the query-recovery attack proposed by Damie 
et al. [6] to achieve keyword conjunction recovery. We explore the applicability of 
the attack in two setups: (1) a similar-documents attack, where the attacker only 
has access to a set of documents that is similar, but otherwise different, from the 
indexed documents and (2) a known-documents attack, where the attacker has 
(partial) knowledge of the indexed documents. In both setups it is assumed the 
attacker knows the keyword conjunctions for a small set of queries a priori. We 
experimentally show that our attack can work for a relatively small vocabulary 
size (500) in an attack setup allowing only conjunctive keyword search using 2 
keywords. However, we show that in an attack setup using similar-documents 
the attack performs poorly unless many known queries are assumed to be part of 
the attacker’s knowledge. Furthermore, we demonstrate limitations of our generic 
extension posed by the combinatorial complexity increase for larger conjunctions. 


2 Related Work 


Most attacks against searchable symmetric encryption that have been described 
in the literature are query-recovery attacks. Islam et al. [10] were the first to 
propose a passive query-recovery attack in which they are exploiting the access 
pattern leakage, i.e. leaked document identifiers from observed queries. In their 
attack, the adversary needs to know all the documents indexed on the server to 
be successful. They introduced the idea of computing (word-word and trapdoor- 
trapdoor) co-occurrences to attack SSE. This idea being reused by other the 
passive attacks. The attack works by finding the closest mapping between the 
word-word co-occurrence matrix and trapdoor-trapdoor co-occurrence matrix in 
which they use meta heuristic simulated annealing. Also, the attack requires a 
number of known queries to work, i.e. trapdoors from which the attacker knows 
the underlying plaintext value. 

Cash et al. [3] proposed another passive query-recovery attack. Their attack 
first exploits that keywords with high frequency have unique keyword document 
counts to initialize their set of known queries. Then for keywords that do not have 
a unique keyword document occurrence count they construct a co-occurrence 
matrix of their known documents and observed queries, similar to Islam et al. 
They try to recover more queries by constructing for every unknown query their 
candidate set (i.e. keywords having the same document occurrence count) and 
remove candidates from the set that do not have the same co-occurrence with a 
known query in the known queries set. If after iterating over every known query 
only one candidate is left, the last candidate is appended to the known queries 
set. This process is repeated for all unknown queries until the set of known 
queries stops increasing. 
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Both [3,10] rely on the attacker knowing a large part of the indexed doc- 
uments, where the count attack performs better than the attack by Islam et 
al. However, their query recovery rate roughly only increases when the attacker 
knows at least 80% of the indexed documents. 

The query-recovery attack proposed by Pouliot et al. [21] uses weighted graph 
matching where the attacker needs to find mapping of keyword graph G and trap- 
door graph H. The attack achieves recovery rates above 90% when the attacker 
knows the entire set of indexed documents, but fails as similar-documents attack 
unless having a smaller set of documents and vocabulary size. Also, the runtime 
of the attack increases rapidly, where for a vocabulary size of 500 the attack runs 
in less than one hour, whereas it takes more than 16 h for a vocabulary size of 
1000. The attack in [10] has a runtime of a maximum of 14 h, whereas attacks 
from [3,6] run in seconds. 

Ning et al. [15] introduced a query-recovery attack that works when the 
attacker knows a percentage of the indexed documents. Keywords and trapdoors 
are represented as a binary string where the i-th bit is 1 if the keyword (resp. 
trapdoor) occurs in document i. Recovery is done by converting the bit strings 
to integers, where it is considered that a keyword corresponds to a trapdoor if 
they have the same integer value. 

The proposed attack outperforms the attack by Cash et al. [3], where in their 
scenario [3] achieves a recovery rate of roughly 28% and their proposed attack 
around 56% when the attacker knows 80% of the indexed documents. However, 
they do not report a recovery rate for an attacker having knowledge of more 
than 80% of the indexed documents. 

Blackstone et al. [2] proposed a “sub-graph” attack requiring much less known 
documents to be successful and also works on co-occurrence hiding schemes. 
Their experiments show that an attacker only needs to know 20% of the indexed 
documents to succeed in her attack. 

In [6], Damie et al. proposed their refined score attack that works in a setting 
where the attacker only knows a similar, but otherwise different and non-indexed, 
set of documents for query-recovery. A mathematical formalization of the simi- 
larity is proposed in their paper. In [3] they showed that both the attack proposed 
by Islam et al. [10] and their proposed count attack do not work using similar 
documents. In [6], the query-recovery attack uses similar techniques as used by 
[3,10], i.e. constructing co-occurrence matrices from the document set known by 
the attacker and a trapdoor-trapdoor co-occurrence matrix from the assumed 
access pattern leakage. By starting with a few known (keyword, trapdoor)-pairs 
their attack iteratively recovers queries where previous recovered queries with 
high confidence scores are added to the set of known queries. Using this approach 
their attack reaches recovery rates around 85%. 


Other Types of Attacks. Zhang et al. [28] proposed an effective active doc- 
ument injection attack to recover keywords. Furthermore, they proposed an 
extension of their attack to a conjunctive keyword search setting which was 
experimentally verified for queries with 3 keywords. 

In [18], Poddar et al. proposed several attacks that uses volume pattern as 
auxiliary information in combination with the attacker’s ability to replay queries 
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and inject documents. Moreover, they also gave an extension of their attack for 
queries with conjunctive keywords which is based on the extension from [28] 
using a document injection approach. 

Liu et al. [14] proposed a query-recovery attack which makes use of the search 
pattern leakage as auxiliary information. In particular, they exploit the query 
frequency. However, they simulated their queries by applying Gaussian noise to 
keyword search frequency from Google Trends! because of the lack of a query 
dataset. The attacker has access to the original frequencies. 

Another attack introduced by Oya and Kerschbaum [16] combines both vol- 
ume information derived from the access pattern leakage and query frequency 
information derived from the search pattern leakage as auxiliary information. 


Conjunctive Keyword Search Schemes. Passive query-recovery attacks 
against single-keyword search schemes already work for some conjunctive key- 
word search schemes where the server performs search for each individual key- 
word in a query independently and returns the intersection of document identi- 
fiers of each single-keyword search, i.e. leaking the full access pattern for each 
individual keyword in the conjunction. However, these attacks cannot be applied 
on conjunctive keyword search schemes with less or common access pattern leak- 
age, where common refers to the scheme only leaking the document identifiers for 
the documents containing all keywords from a conjunctive keyword query. Hence, 
in this work we explore one extension strategy for conjunctive keywords that can 
be applied to most passive query-recovery attacks against single-keyword search 
using only common access pattern leakage. 

[19,23] both proposed such a conjunctive keyword search scheme that returns 
the intersection of document identifiers for each individual keyword in a conjunc- 
tive keyword query, thus leaking the full access pattern. However, we would like 
to emphasize that in this scenario only an honest-but-curious server that is able 
to observe the result set for each intermediate keyword can be considered an 
attacker, since an eavesdropper on the communication channel would not be 
able to observe the document identifiers for each intermediate single-keyword 
search. Furthermore, it should be noted that both schemes also offer more func- 
tionality than conjunctive keyword search alone. Where [19] allows for phrase 
searches and [23] offers result set verifiability and index updatability. 

Other proposed conjunctive keyword search schemes exist [4,7—9, 11,13, 24, 
26,27]. However, all of them leak at least the common access pattern, where 
[4,9,25] have more than common access pattern leakage. To the best of our 
knowledge there do not exist efficient conjunctive keyword search schemes that 
have no access pattern leakage. 


3 Preliminaries 


We first introduce some notations that are used throughout this work. Let doc- 
ument set D consist of documents {Dj}, ..., Dn}. Let keyword set W consist of 


1 https://trends.google.com/trends. 
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Notation Meaning Size notation 

Q Set of observed trapdoors by the adversary l 

Ro Document identifiers for each observed td € Q l 

KnownQ Known (td, ckw)-pairs by the adversary k 

ckwg Set of distinct keywords used in d 
a conjunctive keyword query q 

Ceku ckw-ckw co-occurrence matrix created Msimilar X Msimilar OT Mknown X Mknoum 
from Dsimitar Or Dp-known 

Cia td-td co-occurrence matrix created from Ro Ixl 

Dreal Real (indexed) document set Nreal 

D similar Similar document set Aanias 

Dp-knoun P-Known document set (0 < p < 1) Nknown (= P * Nreal) 

Wreal Vocabulary of keywords extracted from Dreal Vreal 

Wsimilar Vocabulary of keywords extracted from Dsimilar Vsimilar 

Wrenown Vocabulary of keywords extracted from Dp-known Vknown 

Kreal Set containing possible conjunctions of keyword Myreal = ("reat) 
combinations generated from Wyeal 

K similar Set containing possible conjunctions of keyword Maimitar = ("sirailar) 
combinations generated from Wsimilar 

Kknown Set containing possible conjunctions of keyword Mknown = ee) 


combinations generated from Wpnown 


keywords {w1, ..., Wm}. Document D; consists of keywords that form a subset of 
keyword set W. Let id(D;) =i return the identifier for document D;. We denote 
x € D; if keyword x (€ W) occurs in document D;. A summary of all notations 
and their meaning used throughout this work is given in Table 1. 


3.1 Searchable Symmetric Encryption 


A searchable encryption scheme allows a user to search in encrypted documents 
and is often described in a client-server setting. The client can search through 
encrypted documents stored on the server, without the server learning informa- 
tion about the plaintext documents. Often a searchable encryption scheme can 
be divided in four algorithms: 


KeyGen(1"): takes security parameter k and outputs a secret key K. 

— BuildIndex(K, D): takes document set D and secret key K and produces an 
(inverted) index Z. 

— Trapdoor(K, q): takes query q and secret key K and outputs a trapdoor tdg. 

Search(/, tdg): takes trapdoor td, and index 7 and outputs the documents that 

match with query q. 


In single-keyword search schemes q corresponds to a keyword w, whereas 
in conjunctive keyword search schemes q would correspond to a query for doc- 
uments containing d keywords, i.e., the conjunction of keywords w 1A ... Awa 
of keywords w1, ..., wa. Then, td, would correspond to the conjunction of d 
keywords. 
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3.2 Considered Conjunctive Keyword Search Model 


We assume a fixed number of keywords (d) that are allowed to be searched for 
in a conjunctive keyword search. For instance if d = 2, only trapdoors with 2 
distinct keywords are allowed. We denote such a fixed-d scheme as secure d- 
conjunctive keyword search scheme. 

For simplicity, we assume a fixed number of d distinct keywords, however one 
could consider d as a maximum number of keywords in the conjunctive search 
by reusing the same keyword for non-used keyword entries in the conjunction. 
For instance, when d = 2, kw A kw for the same keyword kw would be equivalent 
to a single-keyword search for kw. 

We consider ckw to be the set of d different keywords that are used to con- 
struct a trapdoor (tdekw). For instance, if we consider a conjunctive keyword 
search scheme that allows search for d = 3 conjunctive keywords, we would 
create a keyword set ckw for every possible combination of 3 keywords, where 
chu, = {kw, kwWo, kw3}.? 

First, in the BuildIndex algorithm, the client encrypts every document in 
the document set locally. Then creates an encrypted index of the document set 
(locally). Given a trapdoor tdekw, the server can find the documents containing 
keywords in ckw using such a created index. The encrypted document set and 
index are then uploaded by the client to the server. 

Although in literature different methods for constructing such an index were 
proposed, here we do not fix which index is used. We only require the model to 
have at least common access pattern leakage, where common refers to the scheme 
only leaking the document identifiers for the documents containing all keywords 
in a conjunctive keyword query. All conjunctive search schemes described in 
Sect. 2 leak at least the common access pattern. 

The client can search documents by constructing trapdoors. The client con- 
structs a trapdoor by picking d keywords she wants to search for. In our 
model, she constructs a trapdoor using the function td, = Trapdoor(K, ckw; = 
{kw1,...kwa}), for the keywords she wants to search for. By sending the trap- 
door td, to the server, the server responds with a set of document identifiers 
Rid, for documents that contain all keywords in ckwi;. 


3.3 Attacker Model 


Like in [6], we consider two types of passive attackers which both can observe 
trapdoors sent by a user and its response including the document identifiers. The 
first type of attacker is an honest-but-curious server. The server is considered 
to be an honest entity meaning it follows the protocol. Hence, it always returns 
the correct result for each query. However, such curious server tries to learn as 
much information as possible using the scheme leakage. Secondly, we consider an 
eavesdropper that is able to observe pairs of trapdoor and document identifiers 
from the communication channel between client and server as an attacker. 


2 Note: d = 1 refers to a single-keyword search scheme. 
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For both attackers an observation; is a tuple (tdg, Rea q) considering conjunc- 
tive keyword queries where trapdoor td corresponds to d conjunctive keywords. 


3.4 Attacker Knowledge 


It is assumed the attacker knows the number of keywords d that are allowed 
to construct trapdoors. Moreover, it can be assumed that an honest-but-curious 
attacker knows the byte size of the stored documents and the number of docu- 
ments stored (e.g. from the index). However, an eavesdropper does not. In that 
case we make use of the proposed formula by [6] that approximates the number 
of documents stored on the server (nreai) derived from the attacker’s knowledge. 

We consider two types of attack setups, i.e. a s¢milar-documents attack setup 
where the attacker has access to a set of similar documents (as formalized in [6]) 
and a known-documents attack setup where the attacker has (partial) knowledge 
of the documents stored on the server. 


Similar-Documents Attack. In our similar-documents attack we assume the 
attacker has a document set Dsimilar that is e-similar to the real indexed doc- 
ument set Drea. However, we assume e-similarity (as formalized in [6]) over 
the possible keyword conjunctions rather than keywords, where smaller € means 
more similar. Also, Dimitar N Dreat = 9, thus do not have overlapping docu- 
ments. 


Known-Documents Attack. Like in [3,10], for our known-documents attack 
setup we assume that the attacker has a p-known document set Dp-known, Where 
0 < p < 1 defines the known-documents rate. Meaning, the attacker knows a 
fraction p of the real indexed document set Drea, stored on the server. 

It should be noted that a similar-documents attack can be considered more 
realistic than a known-documents attack as discussed by Damie et al. [6]. Since 
a known-documents attack will most likely only be possible on a data breach, 
whereas documents that are only similar to the actual indexed documents maybe 
even publicly available. Moreover, the user could remove the leaked documents 
that are used in a known-documents attack from the index. 

The assumption that the attacker knows (a subset of) the documents stored 
on the server is rather strong, but is based on what is done in previous work 
(3, 10]. 


4 CKWS-Adapted Refined Score Attack 


In this section we describe our conjunctive keyword search (CKWS) adaptation 
of the refined score attack. Our adaptation builds upon the score attacks that 
were introduced by Damie et al. [6]. We have chosen to use their query-recovery 
attack against single-keyword search schemes, since it is, to the best of our 
knowledge, the most accurate similar-documents attack that has been described 
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yet. Furthermore, the matching algorithm used in their attack only has a run- 
time of 20 s while considering a vocabulary size of 4000 keywords. Since the 
space of possible queries increases combinatorial, we have to consider many pos- 
sible keyword conjunctions and thus faster runtimes is desired. Moreover, their 
attack can use either known documents or similar documents as adversary’s 
knowledge. We describe how one can transform their query-recovery attack to 
an attack on conjunctive keyword search schemes, i.e. considering the (abstract) 
secure d-conjunctive keyword search scheme described in Sect. 3.2, using similar 
terminology as in [6]. 

In addition, the code for the score attacks has been made publicly available 
online by Damie et al. This allowed us to verify their results first before adapting 
it to our conjunctive keyword setting. 


4.1 Score Attacks 


Damie et al. [6] first propose the score attack based on the idea of ranking potential 
keyword-trapdoor mappings according to a score function. To run the score attack 
an attacker calculates the word-word co-occurrence matrix from its auxiliary docu- 
ment set and constructs a trapdoor-trapdoor co-occurrence matrix from observed 
queries and their result sets. Assuming some known queries, the attacker removes 
the columns from both matrices that do not occur in their set of known queries (i.e. 
word-trapdoor pairs) to obtain so-called sub-matrices. Then for every (observed) 
trapdoor, it goes through all possible keywords extracted from the auxiliary doc- 
ument set and returns the keyword for which their score function is maximized. 

Secondly, their proposed refined score attack builds upon previously described 
score attack. Instead of returning a prediction for all trapdoors, they define a 
certainty function for each prediction and only keep the RefSpeed best predic- 
tions according to this certainty function. These predictions are then added to 
the set of known queries and the attacker recomputes the co-occurrence sub- 
matrices. This procedure is repeated until there are no predictions left to make, 
i.e. no unknown queries left. 


4.2 Generic Extension 


In short, our generic extension proposes to replace single keywords with key- 
word conjunction sets. The extension consists of five steps, highlighted by the 
next five subsections to adapt a passive query-recovery attack against single- 
keyword search to conjunctive keyword search, i.e. attacks that try to find a map- 
ping between co-occurrences of keywords and trapdoors to recover queries. We 
describe our extension in a similar-documents attack setup using Desimitar, but 
the same steps can be taken in a known-documents attack setup using Dp-known 
as the attacker’s auxiliary document set. 


Extract Vocabulary. First, the attacker extracts keywords from the set of doc- 
uments Dsimilar to vocabulary Wsimilar. AS in query-recovery attacks on single- 
keyword search [3,6,10], we also assume that the keyword extraction method 


Passive Query-Recovery Attack Against Secure CKWS Schemes 135 


used by the attacker is the same as the one used by the user when she created 
the encrypted index. 


Construct Set of Possible Keyword Conjunctions. The attacker creates 
the set of all possible keyword conjunctions Ksimiiar = {ckw; E P(Wsimitar) | 
|ckw;| = d}, where Msimilar = \Ksimitarl = (3) and P(X) denotes the power set 
of set X. 


Compute Co-occurrence Matrix for Keyword Conjunctions. From 
Dsimilar and derived keyword conjunctions set Ksimilar the attacker creates the 
Mesimilar X Nsimilar matrix ED simnitay Here IDsimilarli, j] = 1 if the i-th document 
in Dsimilar contains the keywords that are in keyword conjunction ckw; and 
is otherwise 0. Then the attacker computes the ckw-ckw co-occurrence matrix 
Cekw = ID? nilar - ID similar’ EA 

Compute the Trapdoor-Trapdoor Co-occurrence Matrix. We define Q = 
{tdı, ..., tdı} to be the set of observed queries by the attacker containing trapdoors 
that have been queried by the user. These trapdoors were created by the user 
from keyword conjunctions in Krea = {ckw; E P(W real) | |ckw;| = d}. Let Rig = 
{id(D)|(ckw E€ Kyeat) A (td = Trapdoor(K, ckw)) A (D € Dreat) A View, eckw(kwr € D)} 
be the set of document identifiers that were observed by the attacker for trapdoor 
td. Then we define the set of document identifiers DocumentIDs = Utdeg Rta 
of size s, where $ < MNreal. Similar to the construction of the matrix IDsimitar, 
we construct s x l trapdoor-document matrix IDrea where IDreali, j] = 1 if 
i-th document identifier occurs in Ria; (and td; refers to j-th trapdoor from 
Q). Otherwise, IDreali, j] = 0. Then trapdoor-trapdoor co-occurrence matrix 
Cra = ID? *IDreal a 

Apply Attack. The last step is to apply a passive query-recovery attack using 
the set of keyword conjunctions and the co-occurrence matrices. 


4.3 Transform Key Steps of Refined Score Attack 


As in [3,6,10], our attack also requires the attacker to have knowledge of a set of 
known queries. However, our set of known queries is slightly different because of 
the keyword conjunctions. In a similar-documents attack setup our set of known 
queries KnownQ = {(ckwi, tdknown)|(ckwi € K similar N Kreal) A (tdknown € Q) A 
(tdknoun = Trapdoor(K, ckw;)}. For our known-documents attack setup, KnownQ 
is similarly defined by replacing Ksimitar with Kknown- 

We recall key steps in the score attack w.r.t. the projection of the keyword- 
keyword co-occurrence and trapdoor-trapdoor co-occurrence matrix to sub- 
matrices using the set of known queries. These steps are important because they 
are different for our CKWS-adapted refined score attack. In short, the projection 
is done by only keeping the columns of known queries in Cekw and Cta. 


3 AT denotes the transpose of matrix A. 
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Our goal is to generate sub-matrices Cèi and C? from Cekw and Cig respec- 
tively. We describe the projection step for Cekw using Ksimitar, but the same 
holds for Known: Recall that Ksimitar = {ckw1, «5 ChWin inia 

We define pos(ckw), which returns the position of ckw € Ksimitar. That is, 
pos(ckw;) =i. Similarly, pos(td) returns the position of td in Q = {tdj, ..., tdı}. 

Let Cekw = (-.., p.. Jimia be the Msimilar X Msimilar CO-Occurrence 
matrix, where the column vector ¢; denotes its i-th column. Then the Msimilar X k 
sub-matrix CS y = (es +s Čpos(ckw;} s ‘Merurtapeknowaa where Cpos(ckw;) is the 
pos(ckw;)-th column vector of Cekw- 

Let Cia = (..., li.. Jien be the /x/ trapdoor-trapdoor co-occurrence matrix, 
where the column vector i; denotes its i-th column. Then / x k sub-matrix Cis 


can be constructed as follows: Cf} = (es -> Üpos(td;)» PE where 


')cckwj,tdijeKnowng’ 
Upos(td;) 18 the pos(td;)-th column vector of Cta. 

Superscript s emphasizes that C$, and C}, are sub-matrices of Cekw and Cia 
respectively. Also, we denote C%,[ckw;] to be the i-th row vector for keyword 
conjunction set ckw; and C$ [td;] to be the j-th row vector for trapdoor td;, 
where |C®,, Lckwi]| = IC? [tdj]| = k. 

Additionally, we revise the scoring algorithm for which the score is higher if a 
trapdoor corresponds to a certain keyword conjunction, i.e. the distance between 
two vectors Cf [td;] and C$ [ckw;] is small. Using keyword conjunctions the 
score function is defined as: Score(td;, ckw;) = —In(||C%,,,[ckwi] — Cf,[td; |||), for 
all ckw; E€ Ksimitar (Of Kenown) and all td; € Q, where /n(-) is the natural log and 
|| - || is a vector-norm (e.g. L2 norm). 


4.4 Revised Algorithm 


We substitute Cz for C%,,, in [6] to transform the refined score attack to the 
CK WS-adapted refined score attack. Algorithm 1 contains its pseudocode, where 
a step is highlighted blue if it is different from the refined score attack proposed 
by Damie et al. [6]. Note that this algorithm is described using Ksimilar, but also 
works for Kgnown as input. 

One iteration of the algorithm can be defined by the three key phases. First 
remove known queries from the observed queries set Q. Secondly, find the best 
scoring keyword conjunction candidate for each unknown query and compute the 
certainty of this candidate. Using keyword conjunctions the certainty of a key- 
word conjunction candidate ckw; for trapdoor td is defined by: Certainty(td, cku;) 
= Score(td, ckw;) — max;; Score(td, ckw;) 

Using this definition the certainty of a correct match of keyword conjunction 
with a trapdoor is higher when the score of the match is much higher than all 
other possible candidate scores. 

The algorithm defines a notion of refinement speed (Ref Speed) which defines 
the number of most certain predictions that will be added each iteration of the 
algorithm to the set of known queries. Which describes the third and last key step 
of an iteration, i.e. adding the most certain predictions to the known queries and 
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Algorithm 1: a refined score attack. 


Input: Ksimiları Cpu Q, Cig, KnownQ, Ref Speed 

Result: List of keyword conjunctions as predictions for trapdoors with certainty 
final_pred < []; 

unknownQ <— Q; 

while unknownQ + 0 do 

// Set remaining unknown queries. 

unknownQ < {td : (td € Q) A (Ackw € Ksimilar : (td, ckw) € KnownQ)}; 
temp-_pred -= []; 


// Propose a prediction for each unknown query. 
forall td € unknownQ do 
cand < |]; 
forall ckw € Ksimilar do 
s = =In(||C%,,,,Lekul -= C$ tall) 
Append {“kw”: ckw, “score”: s } to cand; 
end 
Sort cand in descending order according to the score; 
certainty — score(cand[0]) — score(cand[1)); 
Append (td, cand[0], certainty) to temp_pred; 


end 


// Stop refining or keep refining. 
if |unknownQ| < Ref Speed then 
final_pred — KnownQ U temp-_pred; 
unknownQ < 0; 
else 
Add RefSpeed most certain predictions temp_pred to KnownQ; 
Add the columns corresponding to the new known queries to Sa ku and 
cL 
end 


end 
return final_pred 


recompute sub-matrices C*,„ and Cf} Thereafter, either start a new iteration 
or stop the algorithm if the number of unknown queries is less than Ref Speed. 


4.5 Complexity 


As in [6], a higher refinement speed will result in a faster runtime, but less accu- 
rate predictions. However, due to our use of keyword conjunctions the number 
of candidates for a trapdoor increases for larger d. Therefore, the runtime of the 
CKWS-adapted refined score attack grows combinatorial. The time complexity 
of the attack is given by O(f(v)+g(v)), where f(v) = Wem’ (d — 1) corresponds 
to the time complexity of the generic extension, wire we a multiplying 
two vectors takes constant time. Further, g(v) = Read’ |Q|- PEET y -k is the 
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time complexity of the attack. For both f and g, input v is either Vsimitar Or 
Vknown depending on the attack setup. 

Besides the increase in runtime, having d > 1 also the space complexity of 
the algorithm increases faster relative to the vocabulary size. Since co-occurrence 
matrix Cekw in the similar-documents attack setup iS Msimilar X Msimilar, 1n terms 
of vocabulary size is me x aR thus increasing faster with larger 
Vsimilar- 

This increase in time and space complexity led us to first further optimize the 
revised algorithm for our implementations. Moreover, we use a GPU to decrease 
runtimes through computing expensive matrix operations on it. 


5 Experiments 


5.1 Setup 


Documents. As described previously, in our experiments we simulate our 
attack using the publicly available Enron email document set introduced by 
Klimt & Yang [12]. We chose this document set since this one is also used in 
most attack papers requiring a set of documents. Similarly, we constructed the 
same corpus of emails from the folder _sent_mail which results in a set of 30109 
documents. 


Keyword Extraction. We extract keywords from solely the contents of the 
emails in the dataset, i.e. we do not consider email addresses or email subjects to 
be part of the document set. For keyword extraction we use the Porter Stemmer 
algorithm [20] to obtain stemmed words, moreover we remove stop words in the 
English language like ‘the’ or ‘a’. Using this method results in a total of 62976 
unique keywords in our entire considered document set. 


Number of Keywords in Conjunction. Throughout our experiments we fix 
d, i.e. the number of keywords allowed in one conjunction, to either 1, 2 or 3. 
This means that no mixture of number of keywords is allowed in search. For 
instance, when the d = 3 only queries with 3 distinct keywords are allowed, i.e. 
queries that contain either 1 or 2 keywords are not allowed. 


Testing Environment. We implemented the attack on an Ubuntu 20.04 server 
with Intel Xeon 20-core processor (64 bits, 2.2 GHz), 512 GB of memory, and 
NVIDIA Tesla P100 GPU (16GB). We used Python 3.7 and the Tensorflow 
library [1] to accelerate matrix operations on a GPU.4 


Limitations. Running experiments with larger vocabulary sizes requires a lot 
of memory, since a vocabulary size of 150 and d = 2 means a document-keyword- 
conjunction matrix size of 18065 x 11175 (already 1.5 GiB) and a maximum co- 
occurrence matrix size of 11175 x 11175 (0.9 GiB) which both have to fit in the 
memory of the GPU for fast calculations. Therefore, having similar vocabulary 


4 Our code is available at https://github.com/marcowindt/passive-ckws-attack. 
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sizes as used in the score attack is unrealistic in our generic extension strategy 
setting without having sufficient resources. However, we propose an extrapolation 
strategy to have approximate results for larger vocabularies. 


5.2 Results 


In our experiments where similar-documents are used as the attacker’s knowl- 

edge, we use the same ratio in similar (40%) and real (60%) documents as in 

[6]. Similar to [3,6,10], we define the accuracy to be the number of correct pre- 

dictions divided by the number of unknown queries excluding the initial known 
ot E CorreetPredie ons 'unknonhiO)i 

queries, i.e. the accuracy = al- Knowng] . 

If not specified otherwise, each accuracy result corresponds to the average 
accuracy over 50 experiments. Also, the vocabulary used in experiments is always 
created from the most frequently occurring keywords in the document set. From 
this vocabulary the keyword conjunctions set is generated. In each experiment 
it is assumed the attacker has observed 15% of queries that can be performed 
by the user, i.e. |Q| = 0.15 - myeq,, where queries are sampled u.a.r. from K;yeq) to 
construct trapdoors. 


—  |KnownQ| = 15 0.40 
`s —— |KnownQ| = 30 0.35 
|KnownQ| = 60 0.30 


Ss 
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Fig.1. Score attack using similar- Fig. 2. Frequency of keyword conjunc- 


documents for varying vocabulary sizes 
and initially known queries with d = 2, 
\Dreail = 18K, |Dsimilar| = 12K, |Q| = 


tions ordered from most frequent to least 
frequent occurring keyword conjunction 
in Dsimilar- 


0.15 + Mregl- 


Result Extrapolation 

Figure 1 shows the accuracy of the score attack from [6] where the attacker has 
access to similar-documents for varying vocabulary size and d = 2. We show 
these results to highlight that we can extrapolate the accuracy of the attack in 
a similar-documents setting closely, where the extrapolation is depicted by the 
dashed line and measured results are the solid line. We obtain this extrapolation 
by first transforming the accuracies using the logit? function. Using this trans- 
formation, we obtain a space in which we seem to have a linear relationship such 
that logit(acc) = b+ Vsimilar + a. We then perform a linear regression to obtain 


5 logit(x) = log(z). 
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these coefficients using our experimental results. Lastly, we use the inverse logit 
function to transform it back to the original scale. We make use of this extrapola- 
tion where running experiments becomes infeasible (i.e. experiments with d = 2 
and Vrea > 500) to extrapolate the accuracy for larger vocabulary sizes. 

In our linear regressions, we do not provide the coefficient of determination 
R? and the p-value since they are based on the assumption that results are 
independent which is not true in our experiments as they all use the same doc- 
ument set. Hence, these values should not be used to evaluate the quality of the 
model even if they are high (e.g. R? ~ 0.95 in Fig. 1) but the linear regression 
is still valid. Although there may exist more precise extrapolation techniques, 
our intention is to have a simple yet realistic approximation of the accuracy for 
larger vocabularies for the sake of our discussion. 


Frequency of Keyword Conjunctions. Figure 2 shows the frequency of a 
keyword conjunction occurring in Dsimitar for d € {1,2,3}, where keyword con- 
junction rank is lowest for the most frequent keyword conjunction. We observe 
the behavior of using keyword conjunctions instead of a single-keyword, i.e. the 
frequency of the most frequent keyword conjunction becomes smaller with higher 
d and the frequency of the least frequent keyword conjunction reaches almost 
zero. This is to be expected, since the larger vocabulary size the higher the prob- 
ability that certain keywords from a keyword conjunction do not appear in any 
document together, i.e. considering the vocabulary is generated with the most 
frequent keywords first. Note however, that the frequency for rank between 200 
and 3600 part is higher for d = 2 relative to d = 1, which is due to the fact that 
obtaining 4000 keyword conjunctions requires a smaller vocabulary size of 90 for 
d = 2, and it is still the case that the most frequent keywords occur together. 
Nevertheless, the same does not hold for d = 3 relative to d = 2, where we actu- 
ally observe a decrease in keyword conjunction frequency. Here it already is the 
case that the most frequent keywords used to create a keyword conjunction of 3 
keywords do not have to necessarily occur together in a document. 
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CKWS-Adapted Refined Score Attack Using Similar-Documents. 
Figure3 shows the accuracy of the CKWS-adapted refined score attack using 
similar-documents with d = 2 and varying vocabulary size. Also, the plot shows 
an extrapolation of the accuracies for vocabulary sizes larger than 130 (and 
smaller than 50). From the extrapolation of the accuracies for varying vocab- 
ulary sizes we clearly see a rapid decrease in accuracy with larger vocabulary 
sizes. We conclude that, when we consider the results with 30 known queries we 
can still reach a reasonable recovery rate above 50% for vocabulary size 300 to 
400 keywords. However, the results are far from the single-keyword search set up 
presented in [6] achieving up to 85% recovery rate for vocabulary size of 1000. 

In [6], they discussed how the ‘quality’ of a known query influences the accu- 
racy. A known query is more qualitative if the underlying keyword occurs more 
frequently. We remind that in the CKWS-adapted setting, it is a way to reduce 
the number of known queries needed. A lower rank of a keyword conjunction in 
Fig. 2 the query for the keyword-conjunction is considered more qualitative. 

Figure 4 shows the accuracy of the CK WS-adapted refined score attack using 
similar-documents with d = 2 and varying number of known queries. The plot 
shows that the standard deviation of the accuracy, assuming 5 or 10 known 
queries, is relatively high compared to the standard deviation for 15, 30, or 60 
known queries. For 5 known queries the standard deviation is 0.15, which is at 
least 3 times higher than the standard deviation for 15 known queries (~0.05). 
The accuracy increases and standard deviation decreases with a higher number 
of known queries, since it becomes more likely to pick more qualitative queries 
(u.a.r.). This also explains why we observe this noisy behavior of the accuracy 
in the plot. 
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CKWS-Adapted Refined Score Attack Using p-Known-Documents. 
Since we have shown in Sect. 5.2 that the CKWS-adapted refined score attack 
does provide limited scaling with having d > 1, we explore how well the attack 
performs assuming known-documents as the attacker’s knowledge. Figure 5 
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shows the accuracy of the attack using known-documents with varying known- 
documents rates of 0.05 < p < 0.8 and steps of 0.05. We observe that with the ini- 
tial |KnownQ| = 10 setting the attack achieves higher accuracies faster for lower 
known-documents rates compared to an attack setting having |KnownQ| = 5 
initially. Also, with known-documents rates p > 0.7 the accuracy of the attack 
becomes constant and reaches near 100% accuracy for both 5 and 10 known 
queries. However, we do note that having a vocabulary size of vreal = 130 is a 
rather limited setting. In the next section we explore the attack using known- 
documents with larger vocabularies. 


CKWS- Adapted Refined Score Attack Using 0.7-Known-Documents. 
In the previous result with varying known-documents rates we observed that the 
accuracy of the attack using known-documents reaches near 100% for known- 
documents rate p = 0.7 for both 5 and 10 known queries. Here we explore 
the accuracy of the attack by fixing the known-documents rate to p = 0.7 with 
vocabulary sizes 250 and 500. Figure 6 shows a bar plot for both these results with 
error bar describing the standard deviation of the accuracy over 50 experiments. 
We observe that for vocabulary size 250 the difference with an attack using 
5 known queries compared to 10 known queries is small. Also, the standard 
deviation in both settings is small. However, for the 500 keyword setting we 
clearly see a decrease in accuracy using 5 known queries and a large standard 
deviation. Whereas for 10 known queries the attack still reaches above 93% 
accuracy and standard deviation is small. We do note however that in this case 
an attacker has great advantage, since it knows at least 70% of the whole indexed 
dataset and 10 known queries. In comparison, previous passive query-recovery 
attacks [3,10] on single-keyword search did not exceed 40% accuracy assuming 
known-documents rate of 0.8. 
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Fig. 7. Runtime of the CKWS-adapted refined score attack using known-documents 
w.r.t. to vocabulary size, with d = 2 and p = 0.7. 


Runtime and Memory Usage. Figure 7 describes the average runtime of the 
attack using known-documents over 50 repetitions in function of vreal for d = 2. 
We observe that the runtime is high for considerably small vocabulary sizes, 
which is to be expected considering the time complexity described in Sect. 4.5. 
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We only show the runtime of the attack using known-documents, however, run- 
time of the attack using similar-documents is similar. Although our runtime can 
further benefit from using multiple GPUs and even our code is written in such 
fashion, we found that using two GPUs does not necessarily speed up our attack 
due to large overhead. 

The overall memory usage is dominated by the size of co-occurrence matrices 
Cekw and Cra. Therefore, we can define the main memory usage of the attack by 
the size of these two matrices as a function of the vocabulary size and the number 
of queries observed. In our experiments we always assume the attacker observes 
|Q| = 0.15 - mMreqi queries. As a result an accurate estimation of the bytes used 
by one experiment is given by numberOfBytes(v;ca, d) = 2- (0.15 + 0.152) - Cae . 
sizeof(float), where sizeof(-) returns the number of bytes used by the system to 
store a certain data type. Filling in for Vrea = 500, d = 2 and using 64 bit float, 
numberOfBytes(500, 2) ~ 40 GiB, whereas the GPU used in our experiments fits 
at most 16 GB, meaning batching intermediate results is already required. 


6 Discussion 


Runtime. Although requiring large co-occurrence matrices for the extended 
refined score attack is cumbersome, if the adversary has sufficient memory 
resources these large matrices will not be her only concern. Her main concern 
will be the runtime of the attack because without being able to parallelize our 
attack to multiple GPUs our attack is difficult to run for vocabulary sizes > 500 
and becomes infeasible for vocabulary sizes > 1000, whereas the added time 
complexity using our extension strategy is relatively small. 


Observed Queries. Furthermore, the question arises whether it is realistic for 
an attacker to observe 15% of all possible queries. With only single-keyword 
search we believe this can be achieved. However, with d = 2 the number of key- 
word conjunctions to be observed is big, i.e. 0.15 - ("""'). Although a smaller per- 
centage could be considered more realistic and would even decrease the runtime 
of the attack, larger |Q| is still desired, since it will result in better estimators 
for prediction and thus higher accuracies. 


Query Distribution. In our experiments we only sampled queries using a 
uniform distribution. However, it is likely that this is unrealistic for keyword 
conjunctions, since certain keywords might be more likely to be used in a query 
together whereas other possible conjunctions might not be queried at all. Having 
knowledge of whether certain keywords are more likely to be searched for in 
conjunction would decrease the complexity of the attack, since one can then 
only consider the top most likely keyword conjunctions. 


Countermeasure. Previous query-recovery attacks on single-keyword search 
also describe a countermeasure against their attack. In our work we focus on 
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the question if a generic extension is possible. However, because of our generic 
extension strategy, countermeasures tested in [6] will be applicable but were 
not explored. Also, most introduced countermeasures do not actually leak less 
information, they make the leakage unusable by the attack proposed in the 
corresponding work (e.g. adding false positives in the result set). 


Generic Extension. Although we described an adapted version of the refined 
score attack by [6] to a conjunctive keyword setting since it is good perform- 
ing with low runtimes for single-keywords, our generic extension strategy using 
keyword conjunction sets is also valid for other attacks [2,3,10] and even other 
types of attacks (e.g. attacks using query frequency [14,16]). However, we expect 
similar runtime issues due to the large query space. Blackstone et al. [2] has a 
particular algorithm using cross-filtering that could be helpful to be an attack 
specifically against conjunctive keyword search. 


7 Conclusion 


In this work we presented a generic extension strategy to adapt any passive 
query-recovery attack to a conjunctive keyword search setting. We specifically 
explored its applicability using the refined score attack proposed by Damie et 
al. [6] to a conjunctive keyword search setting. It is the first study of passive 
query-recovery attacks in the conjunctive keyword search setting. We showed 
that our attack using documents that are similar, but otherwise different from 
the indexed documents on the server, does only achieve accuracy of 32% as 
attack on conjunctive keyword search. However, applying the adapted attack 
using known-documents can still perform with a low number of known queries 
and vocabulary size of 500 and achieves a recovery rate similar to previous passive 
query-recovery attacks [3,10,15] against single-keyword search. 

Further, we discussed that the time complexity of the adapted attack grows 
combinatorial with the number of keywords in the conjunctive search query. 
Also, the storage required to perform the attack is dominated by the size of 
the co-occurrence matrices computed from the attacker’s knowledge which also 
increases combinatorial. 
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Abstract. We present a simple yet potentially devastating and hard- 
to-detect threat, called Gummy Browsers (Named after “Gummy Fin- 
gers” that can impersonate a user’s fingerprint biometrics.), whereby the 
browser fingerprinting information can be collected and spoofed without 
the victim’s awareness, thereby compromising the privacy and security 
of any application that uses browser fingerprinting. 

We design and implement the Gummy Browsers attack using three 
orchestration methods based on script injection, browser settings and 
debugging tools, and script modification, that can successfully spoof a 
wide variety of fingerprinting features to mimic many different browsers 
(including mobile browsers and the Tor browser). We then evaluate 
the attack against two state-of-the-art browser fingerprinting systems, 
FPStalker and Panopticlick. Our results show that A can accurately 
match his own manipulated browser fingerprint with that of any targeted 
victim user U’s fingerprint for a long period of time, without significantly 
affecting the tracking of U and when only collecting U’s fingerprinting 
information only once. The TPR (true positive rate) for the tracking of 
the benign user in the presence of the attack is larger than 0.9 in most 
cases. The FPR (false positive rate) for the tracking of the attacker is 
also high, larger than 0.9 in all cases. We also argue that the attack can 
remain completely oblivious to the user and the website, thus making it 
extremely difficult to thwart in practice. 


Keywords: Web security - Browser fingerprinting - Spoofing attack 


1 Introduction 


Many websites and web services leverage browser fingerprinting techniques to 
track their users for various purposes, including targeted advertisements [33] based 
on browsing history and habits, user authentication [1,5,7], and fraud detection 
[6,30]. Browser fingerprinting aims to uniquely identify web browsers. Specifi- 
cally, browser fingerprinting uses a stateless identifier for web browsers composed 
of a set of browser and system attributes, including browser vendor and version, 
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plugins and extensions, canvas rendering, available fonts, performance character- 
istics, platform, clock skews and screen resolutions. These attributes are collected 
through JavaScript APIs and HTTP headers. 

Based on different combinations of browser and system attributes, and their 
uniqueness to the browser, researchers and practitioners have proposed a myriad 
of browser fingerprinting techniques [9, 10,13, 14,17, 21,23, 26,29,41—43, 45-48]. 
However, the uniqueness of the fingerprint alone is not sufficient for prolonged 
user tracking because the browser fingerprint changes over time, potentially when 
the browsers are updated or configured differently [52]. For a successful long-term 
user tracking, changes to the fingerprints need to be tracked to link the current 
fingerprint with previously recorded fingerprints [27,52], using what is referred 
to as a tracking technique. 

The fingerprint linking algorithm Panopticlick, proposed by Eckersley, [27], 
and FP-Stalker developed by Vastel et al. [52], are representative instantia- 
tions of such tracking techniques. Panopticlick showed that its visitors can be 
uniquely identified from a fingerprint composed of only eight browser and sys- 
tem attributes. It follows a very simple heuristic based on the comparison of the 
string representation of browser characteristics. FP-Stalker consists of two vari- 
ants of fingerprint linking algorithms — a rule-based variant and a hybrid variant, 
which leverage ruleset and machine learning algorithms. These algorithms aim 
to link browser fingerprint evolutions for tracking the user. The experiment con- 
ducted in the FP-Stalker paper [52] showed that its linking algorithm, especially 
the hybrid variant, can track a given browser instance for a long period of time, 
significantly better than Panopticlick. 

In this paper, we closely investigate the potential privacy leakage and security 
vulnerability associated with state-of-the-art browser fingerprint linking algo- 
rithms, Panopticlick and FP-Stalker to be specific, motivated by their very 
appealing applications and practicality features. Unfortunately, we identify a 
significant threat vector against such linking algorithms. Specifically, we find 
that an attacker can capture and spoof the browser characteristics of a victim’s 
browser, and hence can “present” its own browser as the victim’s browser when 
connecting to a website. The browser attributes can be easily captured (one- 
time or frequently based on the application) by luring the victim into visiting a 
benign-looking website controlled by the attacker (or a malicious website). Then, 
all (or most of) these attributes can be spoofed (once, or continually based on 
the intended level of adversarial impact on the victim), for example, by inject- 
ing a web script, modifying the existing web script, or utilizing the browser’s 
built-in settings and debugging tools. By spoofing the victim’s browser charac- 
teristics, which are used to construct its fingerprint, the attacker’s browser would 
be recognized as the victim’s browser when visiting a targeted website. 

Exploiting this general threat, we introduce Gummy Browsers, an attack sys- 
tem that can fully compromise the security and privacy of the schemes that lever- 
age browser fingerprinting techniques. For instance, if the browser fingerprinting is 
employed for personalized and targeted ads, the web server, hosting a benign web- 
site, would push the same or similar ads to the attacker’s browser like the ones that 
would have been pushed to the victim’s browser because the web server considers 
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the attacker’s browser as the victim’s browser. Based on the personalized ads (e.g., 
related to pregnancy products, medications and brands), the attacker can infer 
various sensitive information about the victim (e.g., gender, age group, health con- 
dition, interests, salary level, etc.), even build a personal behavioral profile of the 
victim. Leakage of such personal and private information can raise a frightful pri- 
vacy threat to the user. The study of Castelluccia et al. [25] has demonstrated that 
the knowledge of the ads the user is provided in targeted advertising can indeed 
leak significant sensitive information about the user. Similarly, if browser finger- 
printing is used for security purposes, such as user authentication and fraud detec- 
tion (e.g., clickbot detection), our fingerprint spoofing attacker can circumvent the 
security functionality of such defensive schemes. The authentication system may 
be based on some other factors beyond browser fingerprinting. In this paper, we 
only show how to defeat the fingerprinting factor. 

Gummy Browsers can remain hidden and invisible to the targeted user and 
the targeted website. Since the capturing and spoofing of the browser attributes 
is done fully transparently and remotely, Gummy Browsers can be launched 
easily and effectively without being noticed by the user or the website. In this 
light, given the fact that browser fingerprinting techniques are getting deployed 
widely in the real world, Gummy Browsers can have a devastating and lasting 
impact on the online privacy and security of the users. Capturing the victim’s 
fingerprinting information just once allows the attacker to spoof the victim for a 
long period of time. The process can be repeated for further impact. Given the 
fundamental nature of the attack, it would be very difficult to defeat. 

Our experiments consider that the website only uses browser fingerprinting 
for tracking, and does not employ cookies (or cookies are blocked by the user). 
Therefore our attacks and implications of our attacks are only limited to finger- 
print spoofing. 


Our Contributions: We believe that our work makes the following contribu- 
tions: 


1. A Novel Threat of Spoofing Browser Fingerprints: We introduce a 
novel and serious threat raised due to the use of browser fingerprinting tech- 
niques to track the user, referred to as Gummy Browsers. Specifically, this 
attacker with the ability to capture and spoof the browser fingerprint can 
learn various personal and sensitive information about the user based on per- 
sonalized ads and compromise the security of browser-fingerprinting based 
defensive applications, such as user authentication and fraud detection. The 
ease with which this threat can be perpetrated is a strength of our work since 
it can be deployed in real world by even naive attackers. 

2. Design and Implementation of Gummy Browsers: We provide the 
design and implementation of Gummy Browsers that enable an attacker to 
glean sensitive information about the user and compromise the browser fin- 
gerprinting based defensive schemes. Gummy Browsers leverages a benign- 
looking fake website to capture the victim’s browser characteristics (could also 
be a malicious, attacker-controlled website). Gummy Browsers then utilizes 
spoofing methods, such as script injection, script modification, or browser’s 
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built-in setting and debugging tool to orchestrate its browser to appear as the 
victim’s browser. 

3. Evaluation against Notable Fingerprinting Techniques: We employ 
state-of-the-art browser fingerprinting algorithms, specifically Panopticlick 
[27] and FP-Stalker [52], and evaluate the performance of Gummy Browsers 
against them. Based on a dataset of 200+ users, our results show that the 
attacker can successfully spoof the fingerprint of the browser instance to 
match with that of the targeted victim’s browser instance for a long period 
of time without any significant impact on the tracking of the victim. 


2 Background and Related Work 


2.1 Browser Fingerprinting 


Different combinations of the browser and system attributes can be used to 
generate a unique identifier for a given browser, referred to as the browser fin- 
gerprint. Based on different combinations of attributes, various browser finger- 
printing techniques have been proposed [17, 23, 26,4143, 46,47]. These attributes 
can be grouped into three different categories [20] as presented in Table 1. 


(C1) Browser-Provided Information: JavaScript API can be used to extract 
a wide range of system information, referred to as browser-provided information, 
that can be employed to fingerprint a device. A set of such features are listed 
in the first row of Table 1. The feature set in this category includes software 
and hardware details (e.g., browser/OS vendor and version, system language 
[21], platform [13], user-agent string [23], resolution, etc.), device timezone and 
clock drift [23] from Coordinated Universal Time (UTC), battery information 
[47] (e.g., battery charge level, discharge rate), and password autofill [45] (e.g., 
the password is user-typed or auto-filled by a browser or password manager). 
The information corresponding to WebGL [17], a JavaScript API for rendering 
graphics within web browsers, and WebRTC [22], a set of W3C standards that 
supports browser-to-browser applications, e.g., voice and video chat, can also 
be used to fingerprint a browser. WebGL information includes the WebGL ven- 
dor and version, maximum texture size, supported WebGL extensions, renderer 
strings, etc. WebRTC information includes connected media devices (e.g., web- 
cam and microphones) information. The support for local storage, which enables 
the browser to store data without any expiration [18], and the status of do not 
track, which blocks (or allows) the website from tracking [14] are also often used 
in browser fingerprinting. 


(C2) Inference based on Device Behavior: The device information can also 
be extracted by executing a specially crafted JavaScript code on the browser and 
observing the resulting effect. This category of the fingerprinting features is based 
on the fact that the execution of JavaScript code creates different effects based 
on the software and hardware configuration of the device, and hence can be 
used to infer various characteristics of the device. For instance, HTML5 canvas 
renders the text and graphics differently based on OS, available fonts, and the 
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Table 1. Three different categories of browser fingerprinting features [20], and a sum- 
mary of how they can be spoofed via our attack. 


Category | Feature name Spoofable | Spoofing approach | Detectable by targeted websites 
Cl 1. User-agent + - * [23] Yes a, be Hard 
2. WebGL information - * [17] b 
3. System time + - *[23] a, 
4. Battery information [47] a 
5. Cookie enabled + - * [10] a, b, c 
6. WebRTC [22] b 
7. Password autofill [45] b 
8. Platform - * [13] a, 
9. Language + - * [21] a, b, c 
10. Local storage + - * [48] b 
11. Resolution + - * [9] a, b 
12. Do Not Track - * [14] a, b, c 
C2 1. HTMLS5 canvas fingerprinting - * [43] | Yes b Hard 
2. System performance [42] b 
3. Font detection [29] b 
4. Scroll wheel fingerprinting [46] b 
5. CSS feature detection [41] b 
C3 1. Browser plugin fingerprinting + - * [28] | Yes a,b Hard 
2. Browser extension fingerprinting [34] b 
I. +: Features used in Panopticlick [27]. -: Features used in Rule-based Linking Algorithm [52]. *: 
Features used in Hybrid Linking Algorithm [52]. 
II. C1: Browser-provided information. C2: Inference based on device behavior. C3: Extensions and 


plugins. 
III. a: Script Injection. b: Script Modification. c: Browser Setting and Debugging Tool. 


video driver [43]. The elapsed time to execute the JavaScript code can be used 
to infer the performance characteristics of the device [42]. Various aspects of a 
pointing device can be inferred by monitoring the scroll events generated by the 
mouse wheel or touchpad [26]. The browser vendor and version can be inferred 
by testing CSS features [41]. The presence (or absence) of different fonts can be 
inferred by rendering a text with a predefined list of fonts [29]. 


(C3) Browser Extensions and Plugins: The aforementioned approaches can 
be used to extract information about the browser extensions and plugins to build 
a browser fingerprint. Various browser plugins, e.g., Java, Flash and Silverlight, 
can be queried through JavaScript APIs to reveal system information [28]. For 
instance, Flash can provide the OS kernel version. Both Java and Flash can provide 
an enumerated list of system fonts. Installed NoScript (that disables JavaScript) 
and its blacklisted website can be detected by loading a large set of websites. Sim- 
ilarly, AdBlocker can be detected by monitoring if fake ads are loaded on the web- 
sites [34] or not. Other extensions can also be detected by other methods. 


2.2 Representative Fingerprinting Techniques 


As mentioned earlier, various browser fingerprinting approaches have been pro- 
posed in the literature, each utilizing a different set of device characteristics. 
Panopticlick [27] and FP-Stalker [52], specifically its Rule-based Linking Algo- 
rithm and Hybrid Linking Algorithm, are representative browser fingerprint link- 
ing techniques. 
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Panopticlick: Panopticlick [27] leverages eight different browser and system 
attributes to track the user through browser fingerprinting. It categorizes these 
attributes into two groups. The first group contains cookies enabled (C1-5), screen 
resolution (C1-11), time zone (C1-3), and partial supercookie test (e.g., local stor- 
age, session storage and IE userData) (C1-10). The second group contains user- 
agent (C1-1), HTTP ACCEPT headers (C1-9), system fonts (C2-3), and browser 
plugins information (C3-1). To learn the identity of an unknown fingerprint ‘Fu’, 
Panopticlick compares F;, with each of the pre-stored fingerprints ‘Fẹ’. If F, has all 
the eight attributes the same as that of Fk, Panopticlick marks them as the same 
fingerprint, i.e., generated from the same browser instance. If any of the attributes 
in the first group and more than one attribute from the second group differs, Panop- 
ticlick marks F, and Fy, as different fingerprints. In the case where there is only one 
difference in the attribute set from the first group, Panopticlick estimates the sim- 
ilarity score of that attribute between F, and Fp. If the similarity score is higher 
than a set threshold (say 0.85), Fi, is marked the same as Fp. In the rest of the 
scenarios, F„ is marked differently from Fẹ. 


Rule-based Linking Algorithm (RLA): This approach for browser finger- 
printing categorizes the fingerprinting attributes under consideration into three 
sets. The first attribute set consists of operating system (C1-1), platform (C1- 
8), browser name (C1-1), local storage (C1-10), do not track (C1-12), cookies 
enable (C1-5), and canvas (C2-1). The second set consists of user-agent (C1-1), 
GPU vendor (C1-2), renderer (C1-2), browser plugins (C3-1), system language 
(C1-9) and HTTP accept headers (C1-9). The third feature set consists of the 
resolution (C1-11), time zone (C1-3) and encoding (HTTP header). Similar to 
Panopticlick, RLA compares the aforementioned attributes of an unknown fin- 
gerprint F, with each of the stored fingerprint Fẹ. If all the attributes of both the 
fingerprints are the same, RLA marks them as ezact fingerprints. If F, and Fk 
have differences in at least one of the attributes in the first set, RLA marks them 
as different. if Fu, has an older version of the browser, the algorithm will mark 
them as different. Otherwise, it estimates the similarity between the remaining 
attributes from the second and third sets. If the similarity score is greater than 
the set threshold (say 0.75), the algorithm counts the number of features that 
are different between F,, and Fk. All the Fis that have less than one different 
attribute from the first set and less than two different attributes from the first 
and second sets are marked as candidate fingerprints. If all the F,-s that have 
been marked as exact fingerprints correspond to the same user, Fy is assigned to 
that particular user. Similarly, if all the F,-s that have been marked as candidate 
fingerprints belong to the same user, F, is assigned to that particular user. In 
the rest of the cases, F, is recognized as a new user. 


Hybrid Linking Algorithm (HLA): This approach enhances RLA with the 
machine learning technique. HLA divides the browser attributes into two sets. 
The first set consists of the operating system (C1-1), device platform (C1-8), 
browser information (C1-1), local storage (C1-10), do not track (C1-12), cookies 
enable (C1-5), and canvas (C2-1). The second set contains the following nine 
features — number of changes, system languages (C1-9), HTTP based user-agent 
(C1-1), canvas (C2-1), created time (C1-3), browser plugins (C3-1), fonts (C2-3), 
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renderer (C1-2) and resolution (C1-11). HLA compares an unknown fingerprint 
‘Fa’ with each of the known fingerprints ‘Fk’ to give an identity to Fy. Fk is 
assigned to the set “exact” if these two fingerprints have the exact same first 
attribute set, otherwise, to the set “Fy sup”. If all the fingerprints in the set exact 
have the same id, this id is assigned to Fu, otherwise, a new id is given to Fu. If 
there are no fingerprints in the set exact, HLA compares the first attribute set 
of Fa with that of each of the Fk in Fy 545. Each attribute comparison results 
in ‘1’ if the attribute is the same in both Fp and F„, otherwise, ‘0’. If there are 
less than five different attributes, HLA feeds the results to the machine learning 
model, Random Forest to be specific, resulting in a similarity score (in the range 
of 0 and 1). The Fẹ having a score higher than 0.994 is assigned to the set 
‘candidates’. The Fp-s in the candidate set are sorted in descending order of the 
score. If the first score is larger than the second one plus 0.1, the id of F, ID 
becomes the top-one id. If the top-one and top-two ids have the same id, this id 
is assigned to Fu, otherwise, a new ID is given to Fu. 


2.3 Applications of Browser Fingerprinting 


Targeted Advertising: The browser fingerprinting can be employed to provide 
targeted and personalized ads on the user devices (e.g., general desktop PC, 
handheld mobile device) [33]. When a user visits a website, the web server (or 
the online service provider) extracts and stores the browser fingerprint along 
with the user’s browsing behavior. When the user revisits the same website, the 
web server looks for his fingerprint in its repository and pushes the relevant 
ads based on the user’s prior browsing behavior. Besides browser fingerprinting, 
there exist other approaches for targeted advertisements, such as account-based 
targeted ads [50] and cookie-based targeted ads [51]. Unlike these approaches, 
the browser fingerprinting neither requires the user to log into his online account, 
nor requires the user to enable the cookie, rather it works transparently. 


User Authentication: Various services, such as Oracle [1], Inauth [7] and 
SecureAuth IdP [5] are leveraging the browser fingerprinting technique to 
enhance the overall security and usability of their authentication mecha- 
nisms [39]. The browser fingerprinting is usually integrated with existing authen- 
tication schemes, such as two-factor authentication (2FA) schemes [5]. On suc- 
cessful login, the web server captures and stores the browser fingerprint of the 
device that the user has used to login. Next time, when the user attempts to 
login to the same web service using the same device, the current browser finger- 
print is matched against the stored fingerprints. If they match with a high score, 
the second-factor of 2FA process is dropped (i.e., no need to provide the PIN), 
merely typing in the password is sufficient to login. Thus, browser fingerprint- 
ing approach for authentication lowers the user-effort during the authentication 
process, and hence improves the system’s usability. 


Fraud Detection: Several security services, e.g., Seon [30] and IPQuali- 
tyScore [6], are leveraging browser fingerprinting for the purpose of fraud detec- 
tion and prevention in the online setting. The fraud detection techniques can be 
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Fig. 1. A high-level overview of the Gummy Browsers attack model. 


categorized into two groups — supervised and unsupervised methods [24]. The 
supervised method leverages the information from the prior fraud behavior (i.e., 
already marked as fraud) to build a model to infer if the current behavior is fraud 
or non-fraud. The unsupervised method does not rely on the prior fraudulent 
behavior, rather it sets a baseline for normal behavior. If the current behav- 
ior significantly deviates from the baseline behavior, the unsupervised method 
marked the behavior as fraudulent. The browser fingerprinting can be used to 
mark the user as a fraudster or a legitimate user. When any of these methods 
find the user’s behavior fraudulent, the service provider captures and flags the 
browser fingerprint as fraudulent. Since the browser fingerprint changes over 
time, a risk level can be estimated by comparing the browser fingerprint against 
the flagged fingerprints. If the risk level is significantly high, the current user is 
flagged as a fraudster. 


3 Attack Model and Spoofing Methods 


3.1 Attack Model 


Gummy Browsers consider a remote adversary who can spoof the victim’s 
browser to a target remote web service. The main goal of Gummy Browsers 
is to fool the web server into believing that a legitimate user is accessing its ser- 
vices so that it can learn sensitive information about the user (e.g., interests of 
the user based on the personalized ads), or circumvent various security schemes 
(e.g., authentication and fraud detection) that rely on the browser fingerprinting. 
A high-level overview of the attack is shown in Fig. 1. 

We assume that the attacker has obtained the browser fingerprint of the vic- 
tim. The adversary can easily capture the victim’s fingerprinting information by 
designing a benign-looking website and luring the victim into visiting his website. 
The adversary can leverage the exact mechanism as that of any fingerprinting web- 
site to acquire the browser fingerprint, i.e., via JavaScript APIs. It is also possible 
that a compromised web service, running a malicious script, could acquire the vic- 
tim’s browser fingerprint when the victim visits the attacker-owned website. 

We also assume that before accessing a target web service, the attacker spoofs 
(or injects) previously acquired victim’s browser information into his own fully 
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controlled device to present it as the victim’s device. When the attacker visits the 
target website, the target web server would receive the victim’s fingerprint from 
the attacker’s device. Therefore, for the target web service, it looks like the victim 
is accessing its services, and can not really recognize the malicious attacker. 

We consider three different modes of executing the attack. An adversary can 
retrieve and spoof the victim’s browser fingerprint only once, referred to Acquire- 
Once-Spoof-Once. Acquire-Once-Spoof-Once can be used to bypass the security of 
the user authentication scheme. Alternatively, to increase the impact of the attack, 
the attacker can spoof the same browser fingerprint instance multiple times over 
a few days gap, referred to Acquire-Once-Spoof-Frequently. Leveraging Acquire- 
Once-Spoof-Frequently, the attacker can track the personalized ads associated with 
the victim for a long period of time, and can infer various sensitive information 
about the user, even build a personal profile of the victim. Since the browser fin- 
gerprint changes over time, to increase the attack success rate, the attacker can 
also retrieve and spoof the browser fingerprint multiple times, and is referred 
to Acquire-Frequently-Spoof-Frequently. With this approach, the attacker could 
always obtain the latest browser fingerprint of the victim. This can enable the 
attacker to compromise the security of the fraud detection mechanism. 


3.2 Spoofing Methods 


The key component of Gummy Browsers is the ability of the attacker to spoof 
the victim’s browser fingerprint so that the attacker can present its own browser 
as if it is the victim’s browser in front of the web service. Our spoofing methods 
are only focusing on the features which are listed in Table 1, and we did not spoof 
network level features like IP address. The attacker can leverage the following 
methods to spoof the fingerprint. 


3.2.1 Script Injection 

In browser fingerprinting, when the browser loads a website, the website executes 
scripts consisting of various JavaScript API calls to extract the browser infor- 
mation. To spoof the browser fingerprint, the values extracted by the JavaScript 
API calls should be changed before the browser executes the scripts embedded 
in the website. The objects where these extracted values are stored can be over- 
written by creating a new object with the same name and constructor as that of 
the original JavaScript APIs. To implement this method, a browser extension, 
a specialized and independent software module for customizing a web browser, 
and/or Selenium [16], a portable framework for testing web applications, can be 
utilized. The browser always loads and executes the website scripts in the browser 
extension prior to loading and executing it to the client machine. Those scripts 
would not change any scripts contents that are loaded from the visited websites. 
In the case of Selenium, pre-designed scripts are executed, which is followed by 
launching the browser, loading the website, and executing the embedded scripts. 
The feature of the browser extension and Selenium to execute the scripts prior 
to loading the website allows the adversary to overwrite the browser properties 
extracted through JavaScript API calls. An example is listed in [37]. 
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3.2.2 Browser Setting and Debugging Tool 
Many of the browsers offer a mechanism in the form of the browser setting 
and the debugging tool that enables its users (the attacker in our case) to change 
various attributes of the client device and the browser. For instance, cookies, local 
storage and “do not track” options can be enabled or disabled simply through the 
browser setting in the Google Chrome browser [40] and the “about:config” page 
in the Firefox browser [4]. Further, about:config page in the Firefox browser 
allows the user to design his own APIs that can overwrite the browser’s pre- 
defined APIs. This approach can completely change the browser’s attributes. 
The browser also offers a debugging tool intended for web application devel- 
opers that allows them to debug and improve their web application functional- 
ity [3]. Using the debugging tool, various browser attributes, such as user-agent, 
geolocation, and caches disabled can be easily changed. The changes affect both 
the JavaScript API (e.g., navigator.userAgent) and the corresponding value 
in the HTTP header (e.g., the value of user-agent field). The debugging tool 
allows the changes on the browser’s attributes to any custom value, whether it 
is a pre-defined valid string, or a random text. 


3.2.3 Script Modification 

The browser properties can also be changed by modifying the scripts embedded 
in the website. Once the embedded scripts have extracted the browser infor- 
mation, they can be changed before the website sends it to the web server. 
Utilizing the developer debugging tool (mentioned earlier), a breakpoint can be 
set at the beginning of each script of the website so that the scripts’ execu- 
tion gets paused at the set breakpoint. By inspecting the embedded scripts, the 
JavaScript API expression can be replaced with the spoofed values. For instance, 
platform = navigator.platform can be replaced with platform = “Win32” 
that exposes the underlying platform of the device as Win32, instead of the 
actual platform. However, each API expression should be changed very carefully 
as the use of an incorrect expression (i.e., its value and format) can alert the 
web service, and the changes can fail. 

A more convenient method to spoof the browser information is to leverage the 
fact that JavaScript always uses Ajax (Asynchronous JavaScript And XML) to 
transfer the data to the remote server [31]. Since Ajax employs JSON (JavaScript 
Object Notation) [11,12] format when transferring data to the web server, the 
browser information can be changed by checking the variable in the JSON object. 
Given that the debugging tool shows current variables and their values at each 
breakpoint, the values can be changed easily. Once the changes on the scripts are 
completed, the breakpoints are removed allowing the execution of the modified 
scripts. With this approach, the remote web service would receive the spoofed 
browser attributes. As the executed scripts are never sent outside the client 
machine, the approach remains oblivious to the remote web server. 

Most websites or web services use JavaScript obfuscation on the scripts, 
instead of the native ones. The purpose of using obfuscation is to make the 
scripts difficult to understand. JavaScript Obfuscator Tool [2] is an example of 
such obfuscation methods. JavaScript obfuscation can indeed make script modi- 
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fication harder than native scripts. However, there are JavaScript deobfuscation 
methods that can help us to get native scripts. A previous study [38] and deob- 
fuscation service [19] have proved that deobfuscation can work. So obfuscated 
scripts will not pose a problem in script modification. 

We have listed all spoofing approaches for each feature in Table 1. More 
details for spoofing all features are listed in [37]. 


4 Attack Implementation 


4.1 Acquiring User Browser Fingerprint 


To impersonate as the victim in front of the target website, Gummy Browsers 
needs to acquire the device fingerprinting information from the victim’s device. 
Gummy Browsers employ the following two methods to capture the victim’s 
browser fingerprint. 


With JavaScript: JavaScript provides a variety of APIs that can be utilized 
to extract the device information. The execution of these APIs does not require 
any permission from the users [44]. For instance, the API navigator.platform 
retrieves the details on the platform (e.g., MacIntel, Win32, Linux, etc.) of the 
device that the user is using. The cookieEnabled API tells if the browser has 
disabled cookies or not. These methods are exactly the same as deployed by 
the web service that uses browser fingerprinting. All these APIs are completely 
transparent to the user. 


Without JavaScript: Some device fingerprinting attributes can also be 
extracted through methods other than JavaScript APIs. For instance, user- 
agent, supported languages and their order can be retrieved from the HTTP 
header [36], fonts can be extracted using Flash and CSS. Although JavaScript has 
navigator.userAgent API, the use of HTTP header is preferred to retrieve user- 
agent because the user can disable the JavaScript, thereby failing the retrieval of 
user-agent through JavaScript API. Fortunately, in such a situation, the HTTP 
header can still provide the user-agent attribute of the browser. For some of the 
attributes, such as the list of fonts in the device, JavaScript does not offer any 
APIs. Flash and CSS are used to list the available fonts in the device. 


4.2 Visual Attack 


We utilize the Panopticlick website [15] and the FingerprintJS demo website [8] 
to assess the effectiveness of various spoofing methods, referred to as the visual 
attack. 


Attacking Panopticlick Site: Panopticlick provides a dashboard for display- 
ing the browser information, which we leverage to assess our spoofing methods. 
Figure in [37] presents a snapshot of the Panopticlick dashboard showing fin- 
gerprint information when a (victim) user uses a Firefox browser on a Windows 
machine, i.e., “Win+Firefox”. By visually inspecting the information displayed 
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Table 2. The attacks executed for each user in our evaluation methodology. 


Attack Number 1/2/3 |4 |5 |6 |7 8 9 
Time Gap (day) 1/7 |15 |30 |60 |90 |180 |270 365 


on the dashboard, we validated if the spoofing methods succeed in injecting 
spoofed attributes. We use the browser setting and debugging tool to modify 
the following attributes — user-agent, HTTP accept header, cookie enabled, and 
local storage, used in Panopticlick. Specifically, we use the debugging tool to 
change the user-agent and the browser’s setting option to change the language 
attribute found in HTTP accept header. We change the language category and 
its order in the Google Chrome browser to meet target languages combination. 
To modify the cookie enabled and local storage, we use corresponding options 
in the privacy setting of the Google Chrome browser. To change the remaining 
attributes used in Panonpticlick, either the script injection or the script modi- 
fication approach is used. Due to the convenience of using script modification, 
we use this approach for the said purpose. Specifically, we change the attributes’ 
value in the JSON file of the script such that the Panopticlick would receive the 
modified values. 


Attacking FingerprintJS Site and Real-Life Fingerprint Service: We 
also successfully did the visual attack against FingerprintJS website and the 
Fingerprintjs pro service. We listed full details in [37]. 


4.3 Algorithm Attack: Attacking Prominent Fingerprinting Based 
Techniques 


We emulate the attack against the browser fingerprinting algorithms by simply 
copying the entire fingerprint, referred to as the algorithm attack. To evaluate 
the performance of our algorithm attack, we employ three prominent browser 
fingerprinting algorithms — Panopticlick, RLA, HLA and launch the algorithm 
attack against them. We utilize the dataset from [49], referred to as the original 
dataset, to evaluate the performance of the algorithm attack. Details on the 
dataset are provided in Sect.5.1. Each fingerprint in the dataset has following 
three timestamps: created_date, updated_date and expired_date, which denote 
the timestamps when the fingerprint is created/recorded, updated, and expired, 
respectively. Utilizing the original dataset, various datasets are generated based 
on different collect frequency, referred to as the benign dataset. 

In a real-world setting, an adversary can capture the victim’s browser finger- 
print at any point in time. Given this, we consider that the attacker can spoof 
any of the fingerprints in the original dataset. Therefore, we copy one fingerprint 
instance of the given user at a time, update the creation date and order, consider 
it as a spoofed fingerprint, and inject it back into the original dataset, forming 
the attack dataset. Such an injection of copied fingerprint simulates the scenario 
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where an adversary acquires the victim’s fingerprint, and then tries to imper- 
sonate the victim by spoofing the fingerprint. The fingerprinting algorithms are 
executed on the attack dataset to link together the browser fingerprints from the 
same user. The attack succeeds if the fingerprinting algorithm incorrectly marks 
the spoofed fingerprint as from the victim. 

Since the browser fingerprint changes over time, the impact of the algorithm 
attack may vary based on the gap between the time when the fingerprint is 
acquired and the time when the attack is launched, referred to as “time gap”. 
In terms of the dataset, the time gap refers to the difference in the created_date 
between two fingerprints from the same user. To measure the effectiveness of 
the time gap in our algorithm attack, we design and build nine different attacks 
based on nine different time gaps. The attack number and corresponding time 
gaps are presented in Table 2. 

In the original dataset, each user has more than one fingerprint collected 
over a long period of time. To execute the aforementioned nine different attacks, 
we assume that the adversary captures the oldest of the fingerprints (the first 
one) of the user and spoofs after each of the ‘n’ days considered in nine different 
attacks, referred to as spoofed/copied fingerprint. Thus, we consider Acquire- 
Once-Spoof-Frequently setting for our nine attacks. The created_date of the 
spoofed fingerprint is set as ‘n’ days after its original created_date. Similarly, 
the expired_date is set to 5 days after its created_date. Since none of the three 
algorithms uses the updated_date, we set its value to “NULL”. Although we 
employ Acquire-Once-Spoof-Frequently approach for all our attacks, the results 
are also applicable to Acquire-Once-Spoof-Once, where the fingerprint is spoofed 
only once. If the fingerprint is acquired frequently over a period of time, our 
attack would have a higher chance to succeed. 

To evaluate our algorithm attack, we utilize the exact same code as that of 
FP-Stalker, which is made publicly available in the GitHub repository by its 
authors [49]. They have implemented all three algorithms, namely Panopticlick, 
RLA, and HLA, considered in our study, and can be found in their code repos- 
itory. For each user in the dataset, we run these algorithms in two different 
settings — i) the benign setting without any spoofed fingerprints, and ii) the 
attack setting with nine different spoofed (or attack) fingerprints. 


5 Dataset and Evaluation Methodology 


5.1 FP-Stalker Dataset 


We use the FP-Stalker dataset [49] to evaluate the performance of Gummy 
Browsers against browser fingerprinting techniques. The authors of FP-Stalker 
designed and built two extensions, one for the Firefox browser and the other for 
the Chrome browser, and used the AmIUnique website to collect the browser 
fingerprints. Although they noted that their dataset consists of 98598 browser 
fingerprints from 1905 users collected over a period of two years in their paper, 
their public dataset contains only 15000 fingerprints collected from 1819 users. 
Each fingerprint in the dataset contains 40 variables. 38 of them correspond to 


160 Z. Liu et al. 


browser fingerprinting attributes. The remaining two variables are “Counter” 
and “ID”. The counter denotes the order of the fingerprint based on the created 
date of the fingerprint. ID uniquely represents an individual user, referred to as 
“original ID” in our analysis. 

We observed that the fingerprints in the dataset have inconsistency, i.e., 
the fingerprints from the given user do not have consistent browser attributes, 
e.g., different operating systems, the newer fingerprint having older browser ver- 
sion/vendor than the older fingerprint. As such inconsistency in the dataset may 
impact the performance of the browser fingerprint algorithms as well as that of 
our attack, we removed all inconsistent fingerprints resulting in the dataset with 
the fingerprints from 275 users. Further, we remove the user having less than 
seven fingerprints, which is considered insufficient for the three fingerprint algo- 
rithms, dropping the user counts in the dataset from 275 to 239. This dataset is 
what we use to evaluate our attack. 


Collect Frequency: We sample the dataset using a configurable collect fre- 
quency similar to FP-Stalker. Collect frequency indicates how often a browser 
is fingerprinted. The lesser the fingerprinting frequency (or the higher collect 
frequency), the harder it would be to track the user. We use 11 different collect 
frequencies — 1, 2, 3, 4, 5, 6, 7, 8, 10, 15, and 20, in terms of days. To generate 
a dataset for a given collect frequency, we employ the approach as suggested in 
FP-Stalker. When a dataset is sampled using a collect frequency, the approach 
usually extends the dataset by copying (or replicating) the fingerprints at miss- 
ing dates, therefore, we refer to it as the expansion algorithm. The expansion 
algorithm iterates in time with a step of collect frequency days and creates (or 
recovers) the browser fingerprint at each time step (t + fe» i), where t is the 
fingerprint creation date, fe is collect frequency, and i is a natural number. The 
iteration continues until the expired date of the previous and the current finger- 
print is reached. The process is repeated for each of the fingerprints collected 
from the given user. Thus, the expansion algorithm generates a new dataset with 
the fingerprints sampled at a consistent frequency of collect frequency days. 


5.2 Evaluation Methodology 


5.2.1 Visual Attack 
We leverage the Panopticlick website and the FingerprintJS demo website and 
use various combinations of the terminal and the browser that the victim user 
may use to visually assess the spoofing methods. We employ a Mac laptop run- 
ning macOS 10.14 Mojave, an Android phone running Android OS Pie 9.0, a 
Windows desktop running Windows 10 OS as the terminal, while we use Google 
Chrome, Mozilla Firefox, Microsoft Edge, and Tor as the browser. Using the 
Panopticlick website, we note all the fingerprinting features when using different 
terminal-browser combinations. 

For the purpose of our evaluation, we consider that the attacker uses the 
Google Chrome browser on the Mac laptop, i.e., “Mac+Chrome” to launch the 
attack. We believe that this is a very standard setup, and the attacker can just 
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use this setup to launch the spoofing attack. Since the user may use different 
combinations of the terminal and the browser to access the target website, we 
consider the browser fingerprint obtained from all the remaining combinations of 
the terminal and the browser as the victim’s browser fingerprint. We spoof each 
of the victim’s fingerprints on the attacker’s Mac+Chrome setup using various 
spoofing methods detailed in Sect. 3.2. To validate if the spoofing methods have 
indeed succeeded or not, we compare the fingerprint shown on the attacker’s 
browser after spoofing with the previously noted victim’s fingerprint. 


5.2.2 Algorithm Attack Evaluation Scenarios 

As mentioned earlier, to emulate our attack against the three fingerprinting 
algorithms, we insert nine spoofed fingerprints, each corresponding to our nine 
different attacks, to the original dataset. We inject the spoofed fingerprint after 
the latest fingerprint in the dataset that has the smaller (or same) created date 
as that of the spoofed fingerprint. As the counter in the dataset represents the 
order of the fingerprint based on its created date, when injecting the spoofed 
fingerprint, the dataset is re-organized for the counter. Thus, after injecting all 
our nine spoofed fingerprints, the new dataset would contain 15009 fingerprints 
(the original dataset had 15000 fingerprints), with a different and corrected order 
in terms of the counter. 

In our evaluation, we choose one user as a victim at a time and evaluate our 
attack against the three fingerprinting algorithms, i.e., nine spoofed fingerprints 
corresponding to the chosen user are injected into the original dataset generating 
the attack dataset. The attack dataset is then reverted back to the original 
dataset. We repeat the process for each user in the dataset, resulting in a total 
of 239 attacks (for 239 users). 

FP-Stalker uses 40% of the total fingerprints as a training dataset and the 
remaining fingerprints as the testing dataset. The fingerprint dataset is extended 
leveraging the expansion algorithm (which is based on the collect frequency 
provided in FP-Stalker) resulting in a sufficiently large fingerprint dataset. 


Evaluation Metrics: To evaluate the performance of fingerprinting algorithms 
in the benign setting (using benign dataset), we use true positive rate (TPR), 
whereas, to evaluate the performance of our attack against fingerprinting algo- 
rithms, we use false positive rate (FPR). TPR measures how often the legiti- 
mate fingerprints have been correctly identified as belonging to the correct user’s 
device. FPR measures how often the spoofed fingerprints are incorrectly identi- 
fied as belonging to the victim. 

In our evaluation, since we consider the tracking of the user over a period of 
time, we compute TPR and FPR for each day separately. When computing TPR 
and FPR for a given day, we consider only the fingerprints from that particular 
day. We expect the TPR to be high, close to 1, which indicates the benign user 
is being tracked well even in the presence of the Gummy Browsers attack. We 
also expect FPR to be close to 1, which denotes the attack is highly successful. 
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6 Results 


6.1 Visual Attack Results 


We have successfully spoofed all the fingerprinting information on Panopticlick 
and Fingerprint JS website. The full details of spoofing results are listed in [37]. 


6.2 Algorithm Attack Results 


6.2.1 Benign Setting 

To validate the implementation of the three algorithms (obtained from FP- 
Stalker repository), similar to FP-Stalker, we plot various graphs on the per- 
formance of these algorithms for tracking the users. Figure 2 shows the average 
tracking duration (and Figure in [37] shows the average of maximum tracking 
duration) as a function of collect frequency for the three different fingerprinting 
algorithms. The tracking duration indicates the time duration (in terms of days) 
that the fingerprinting algorithm can track the user. The higher value of average 
tracking duration is considered good for user tracking. Figure2 (and Figure in 
[37]) shows that the HLA outperforms Panopticlick and RLA at tracking the 
user, which is inline with the one reported in FP-Stalker. Further, we achieved 
similar results as those reported in FP-Stalker for each of the three fingerprinting 
algorithms. 
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Fig. 2. Average tracking duration as a function of collect frequency for three different 
algorithms. 
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Fig. 3. Average ownership as a function of collect frequency for three fingerprinting 
algorithms. 
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Figure 3 shows the average ownership as a function of collect frequency. Own- 
ership indicates how often the fingerprints were correctly associated with their 
actual users by the fingerprinting algorithms. The higher the ownership score, the 
better would be the performance of the fingerprinting algorithms. We achieved 
average ownership of above 0.95 for all the three fingerprinting algorithms, which 
is inline with that reported in FP-Stalker [49]. Figure in [37] shows the number 
of new IDs assigned to each user as a function of collect frequency for three dif- 
ferent fingerprinting algorithms. If the number of new IDs assigned to a user is 
‘1’, this means all his fingerprints have been identified as from the original user 
(the best result). If the number of new assigned IDs is larger than ‘1’, say ‘n’, 
this means the user’s fingerprints are still tracked correctly, but as ‘n’ separate 
tracking durations, which can be seen as from three different users. Although we 
used the exact same implementation of the three algorithms from FP-Stalker, we 
achieved slightly different results compared to those in FP-Stalker. We attribute 
this difference to the difference in the volume of our dataset (239 users) compared 
to that used in FP-Stalker (1905 users). 

Figure 4a, 4b, and 4c show the performance of the tracking algorithms in 
the benign setting. Specifically, they show the TPRs as a function of tracking 
days (when the collect frequency was set as 1) for Panopticlick, RLA, and HLA, 
respectively. Like earlier, RLA and HLA perform better than Panopticlick. 


6.2.2 Attack Setting 

To evaluate the performance of our attack and its impact on the tracking of 
legitimate users, we compute the average of TPRs and the average of FPRs 
over 239 attacks. Figure 5a, 5b, and 5c show the average TPRs as a function of 
tracking days in the attack setting for Panopticlick, RLA, and HLA, respectively, 
when collect frequency is set to 1. When comparing these TPRs with those in the 
benign setting, we see only a very minor difference in the TPR scores, potentially 
because of the addition of the spoofed fingerprints in the attack setting. This 
indicates that our attack does not have any significant impact on the performance 
of fingerprinting algorithms. 

Similarly, Fig. 6a, 6b and 6c show the average FPRs as a function of tracking 
days in the attack setting for Panopticlick, RLA, and HLA, respectively, when 
the collect frequency is set to 1. We achieved average FPRs of greater than 0.95, 
mostly close to 1.00, which indicates that most of the spoofed fingerprints were 
misrecognized as the legitimate ones. In other words, these results show that 
our attacks were highly successful in fooling the fingerprinting algorithms into 
believing the spoofed fingerprints as the legitimate fingerprints. We note that 
similar results were achieved in both the benign and attack settings when the 
collect frequency was set to values other than 1. 


7 Implications of Our Attack 


As the browser fingerprinting is processed at the backend (i.e., the remote server) 
of the website and no web services are claiming that they are using any browser 
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Fig. 4. True positive rate (TPR) as a function of tracking days in the benign setting 
when the collect frequency is set as 1. 
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Fig. 5. True positive rate (TPR) as a function of tracking days in the attack setting 
when the collect frequency is set as 1. 
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Fig. 6. False positive rate (FPR) as a function of tracking days in the attack setting 
when the collect frequency is set to 1. 


fingerprinting approaches, we could not verify the actual impact of our attacks 
without inspecting the backend codes of the website. However, our results show 
that if they were to implement only fingerprinting techniques (without integra- 
tion with any of the cookies, caches, or authentication mechanisms), our attack 
can have a significant impact on the user’s privacy and security applications as 
described below. 


Compromising Ad Privacy: A prior study [26] has shown that by simply 
monitoring the user’s personalized ads, one can build the user’s personal profile. 
In our attack, the attacker is successful at presenting his device to a target 
website as if it is the victim user’s device through various spoofing methods. 
If the target website only uses browser fingerprinting to track the user and for 
personalized ads, the same or similar ads, or the ads from the same category 
would show up on the attacker’s device. Given this, the attacker may learn 
various sensitive information about the user, including his gender, age group, the 
potential location of the user, his habits, and many more. Further, the attacker 
can sell such user’s information for the purpose of personal and financial gain. 


Gummy Browsers 165 


Defeating User Authentication: The purpose of browser fingerprinting in 
authentication is to remember the old device and enhance the security of tra- 
ditional authentication methods such as passwords. For account login, since the 
attacker exposes his device as the victim’s device in our attack, the target web- 
site will misrecognize the attacker as the victim who is using an old device, 
assuming that the attacker has obtained the victim’s login credentials (i.e., the 
user’s username and password). The authentication mechanisms only based on 
browser fingerprinting cannot block such an attack. 


Bypassing Fraud Detection: Given the fact that many of the fraud detection 
techniques use browser fingerprinting information, the attacker can circumvent 
the detection by exposing his device as the victim’s device leveraging various 
spoofing methods. Unless the victim user does not make any major big changes 
on his device (e.g., changing to a different operating system, downgrading sys- 
tem version, or replacing hardware) the attacker can impersonate the victim 
and bypass the detection. Generally, the attacker would be unaware of such big 
changes. However, the attacker can always pull the most recent browser fin- 
gerprint by simply fooling the user into visiting an attacker designed website. 
Given this, the fraud-detection algorithm cannot thwart our attack solely based 
on browser fingerprinting. It needs some additional metrics to detect fraud. 


8 Discussion 


Potential Attack Detection: The web service may detect our attack if the 
adversary does not follow the correct data format, provides invalid data, or takes 
time longer than the set time limit. However, the attack can remain undetected 
if the adversary carefully provides the correct and valid spoofed data within 
the set time limit. To use the script injection approach, the attacker should use 
a valid value to replace the Javascript API values, e.g., in the Date() object, 
‘year’ should be replaced with ‘year’ (not ‘month’). When employing the script 
modification approach, the attacker has to use the correct data format in the 
return value, e.g., ‘2020-04-12’ can be replaced with ‘2020-03-29’, but not with 
‘2020.04.12’. We note that the spoofed date should not be older than the current 
date. To detect our attack, the web service may periodically request a response 
from the website running in the client machine, e.g., request the current time 
for every 5s. When modifying the script, the adversary needs to stop all the 
scripts on the website, and thus prevent the website from sending the response 
to the web service. When the web service does not receive the expected response 
from the client machine in a timely manner, it can detect the potential attack. 
However, the attacker can use a pre-designed script to overwrite the existing 
scripts in the targeted website. The use of such a pre-designed script automates 
the script modification process, thereby defeating the above detection approach. 


Limitations and Future Work: Although the fingerprinting techniques, 
considered in our study, utilize many of the attributes, they exclude several 
attributes used in other fingerprinting algorithms, such as the ones related to net- 
work and protocols (e.g., TCP/IP stack fingerprinting [32], DNS resolver [35]), 
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and the hardware sensors [26] on the performance of our attack. The impact of 
these attributes on the performance of our attack has not been assessed in our 
study. Further investigation is needed to explore this direction. As noted earlier, 
the current dataset is insufficient to evaluate the performance of fingerprinting 
algorithms and that of our attacks after 130 tracking days. Further study with 
a larger dataset would be needed to assess the performance of our attack for a 
longer tracking period. Our study assumes that the attacker can fully spoof all 
browser information obtained from the victim’s device. In some scenarios, the 
spoofed information may be outdated. In such a scenario, only partial browser 
information is correctly spoofed that may impact our attack. Future work would 
be needed to evaluate the impact of partial spoofing on the performance of our 
attack. Furthermore, an ethically-sound study of attacking personalized ads, 
authentication and fraud detection schemes that use fingerprinting in the real 
world via Gummy Browsers should be conducted in future work. Our spoofing 
methods (detailed in Sect. 4.2) can also be extended as an evasion technique that 
can obfuscate the true user’s identity by creating and supplying a fake browser 
fingerprint to the visiting website. Similar to Gummy Browsers, the evasion can 
be oblivious to the target website. The impact of such evasion and subtle differ- 
ence between Gummy Browsers and the evasion technique should be evaluated 
and discussed further in future work. 


9 Conclusion 


In this paper, we identified a novel and serious threat akin to the well-studied 
and popular notion of browser fingerprinting. Specifically, we showed that an 
attacker can make its own browser appear as the victim’s browser by simply 
capturing (through an attacker-controlled or a malicious website) and mimick- 
ing the browser fingerprint (through script injection/modification or the lever- 
aging browser’s built-in settings and debugging tools). By exploiting this threat, 
we introduced and designed Gummy Browsers, an attack system that would 
enable a malicious entity to subvert any web application that uses browser fin- 
gerprinting, for example, to glean various sensitive information about the user 
in a targeted advertising application and to compromise the security of online 
defensive schemes, such as user authentication and fraud detection. We employed 
state-of-the-art browser fingerprinting techniques, Panopticlick and FP-Stalker, 
and evaluated the performance of Gummy Browsers against these algorithms. 
Our results showed that Gummy Browsers can successfully impersonate the vic- 
tim’s browser transparently almost all the time without affecting the tracking 
of legitimate users. Since acquiring and spoofing the browser characteristics is 
oblivious to both the user and the remote web-server, Gummy Browsers can be 
launched easily while remaining hard to detect. The impact of Gummy Browsers 
can be devastating and lasting on the online security and privacy of the users, 
especially given that browser-fingerprinting is starting to get widely adopted in 
the real world. In light of this attack, our work raises the question of whether 
browser fingerprinting is safe to deploy on a large scale. 
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Abstract. Industrial Control Systems (ICSs) rely on insecure protocols 
and devices to monitor and operate critical infrastructure. Prior work 
has demonstrated that powerful attackers with detailed system knowl- 
edge can manipulate exchanged sensor data to deteriorate performance 
of the process, even leading to full shutdowns of plants. Identifying those 
attacks requires iterating over all possible sensor values, and running 
detailed system simulation or analysis to identify optimal attacks. That 
setup allows adversaries to identify attacks that are most impactful when 
applied on the system for the first time, before the system operators 
become aware of the manipulations. 

In this work, we investigate if constrained attackers without detailed 
system knowledge and simulators can identify comparable attacks. In 
particular, the attacker only requires abstract knowledge on general infor- 
mation flow in the plant, instead of precise algorithms, operating param- 
eters, process models, or simulators. We propose an approach that allows 
single-shot attacks, i.e., near-optimal attacks that are reliably shutting 
down a system on the first try. The approach is applied and validated on 
two use cases, and demonstrated to achieve comparable results to prior 
work, which relied on detailed system information and simulations. 


1 Introduction 


Attacks on Industrial Control Systems (ICSs) have been thoroughly investigated 
in the post-Stuxnet era. Different initiatives such as Mitre’s ATT@&CK for ICS 
and SANS Institute’s ICS Cyber Kill Chain have emerged to systematically 
analyze ICS attacks [6,23]. After a decade of research, it became clear that the 
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difficulty of ICS attacks is not the execution itself but the preparation [12]. 
The latter stages of an ICS attack are considered easy because most industrial 
protocols lack security services such as message encryption and authentication, 
Programmable Logic Controllers (PLCs) remain operational for years without 
software updates, unprotected Human-Machine Interfaces (HMIs), and many 
more. The difficulty about preparing an ICS attack is due the complexity of 
ICSs that can have hundreds of components (e.g., sensors, setpoints, actuators) 
and knowing what to target and how to target it is considered far from trivial. 

Most research on ICSs security argues, or simply assumes, that detailed knowl- 
edge about the system is a requirement for a successful compromise. In this work, 
we challenge such a claim by considering an attacker whose goal is to destabilize 
the physical process using only limited process knowledge. Despite this prepara- 
tion constraint, the attack (1) must target one component only once (single-shot); 
and (2) must have a fast effect after the attack execution (near-optimal). The first 
requirement aims at executing stealthy and hard to detect attacks. The second 
requirement discards attacks whose impact take a long time, increasing the chance 
of detection and defensive reaction. More precisely, we call ‘near-optimal’ the top-3 
fastest attacks under the same system and attack conditions. 

Previous research has identified the data sources required by an attacker to 
prepare a successful attack against ICSs [12]. For instance, PLC configuration, 
HMI/Workstation configuration, historian configuration, network traffic, sys- 
tem/component constraints, and piping and instrumentation diagrams (P&IDs). 
The main argument being that only through the combination of multiple data 
sources, an attacker is able to reach the level of “process comprehension” required 
to launch an attack. Previous works confirm this claim. We analyzed the attack 
preparation phase of several papers and categorized their requirements accord- 
ing to the data sources presented in [12]. Table 1 shows that previous works use 
multiple data sources to prepare an ICS attack. 

In this work, we propose a new approach that only requires limited knowledge 
of the system’s architecture, and still allows to identify near-optimal single-shot 
attacks. Our approach requires an abstract representation of the information 
flow in the system - which sensors and setpoints influence which control decision, 
which actuators are controlled by which control function. We show how to obtain 
this information (e.g., from P&IDs), and how to express it in an abstract graph 
representation. We then leverage the accumulated knowledge on IT software 
weaknesses and apply it to the ICSs domain. In particular, we use weaknesses 
from Mitre’s Common Weakness Enumeration (CWE) database, and translate 
them to specific graph patterns. Our goal is to look for these patterns in the ICS 
graph to identify suitable targets. From this set of targets, the attacker picks 
one (e.g., randomly) and executes his ‘one-shot’ attack. 

We evaluate our approach using a simulated ICS known as the Tennessee 
Eastman Plant (TEP). We used two different implementations of the TEP and 
build the corresponding CCL graphs. Our analysis of the graphs reveals weak- 
nesses which we hypothesize are the components in best capacity to compromise 
the availability of the plant. We then execute simulations to test our hypothe- 
ses. We found out that there is a correlation between the targets automatically 
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Table 1. Comparison of the required attacker knowledge in previous works. Data 
sources are based on the process knowledge data source taxonomy from [12]. P&ID: 
Piping and Instrumentation Diagram, PLC: Programmable Logic Controller, HMI: 
Human-Machine Interface, WS: Workstation. 


Data source Work 

[2] [3] [7 [8] [9] [13] [t5] [t6] [19] [28] Our 
PLC configuration v v 
HMI/WS configuration vv V v 
Historian configuration V V Vv 4s Vv Vv vv 
Network traffic viv 
P&ID v v VV Vv VV Vv 


chosen by our approach and the components that, according to the simulations, 
are prone to cause a shutdown of the plant. Although different security aspects 
of the TEP have been studied in the past, most of them have run simulations 
under very limited conditions [13,15]. To gain more confidence in our results, we 
executed 748 individual simulations under 14 different conditions that account 
for 2.52 years of simulated time. 

Summarizing, our two main contributions are as follows. (1) We present a 
novel method to identify near-optimal single-shot attacks on ICSs. Unlike previ- 
ous offensive approaches, our work uses limited process knowledge, thus, offering 
new insights about the actual security risks in ICSs. (2) To validate our app- 
roach, we executed, documented, and published the most extensive set of sim- 
ulated attacks on the TEP to date. The code, results of each simulation, and 
screenshots are provided in the corresponding repositories. 


2 Background 


2.1 Closed Control Loops 


Closed Control Loops (CCLs) constitute a basic programming pattern for ICSs. 
A CCL is comprised of four basic components, namely, a setpoint, a sensor, 
a control function, and an actuator (e.g., valve, heater, light, etc.). The CCL’s 
control function receives inputs from the environment through sensors, compares 
them with pre-established setpoints, and reacts with a compensatory action 
intended to minimize the difference. The compensatory action is executed by 
an actuator in order to control the physical variable measured by the sensor 
(e.g., pressure, temperature, illumination, etc.). Figure 1 depicts the simplest 
form of a CCL. 
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Fig. 1. A CCL comprised of a setpoint (sp), a sensor (se), a control function (f), and 
an actuator (a). Solid lines represent communication in the cyber domain whereas the 
dashed line represents interaction in the physical realm. 


The control function is a software component typically running in high avail- 
ability embedded systems such as Programmable Logic Controllers (PLCs). Con- 
trol functions implement the control logic; for example, arithmetic operations, 
rate limiters, and other kinds of data processing functions. From the function’s 
perspective, hardware devices such as sensors and actuators are abstracted sim- 
ply as variables to be read and written. It is worth noting that in distributed 
control systems these variables might not necessarily reside on the same PLC. 
Therefore, communication to and from the control function might require net- 
work transmissions. 

Advanced CCL configurations are often needed, e.g., to cope with system dis- 
turbances. One of such configurations is called cascade control, where one control 
function adjusts the setpoint of another control function [32]. This dynamically 
computed setpoint is called a calculated setpoint. In contraposition, we denote 
user-defined setpoints as static setpoints. Although we typically make an explicit 
distinction between static setpoints and calculated setpoints, in what follows, we 
use the word setpoint to refer to either of them when such a distinction is irrel- 
evant. Graphically, an example of cascade control is shown in Fig. 2a. 

Another advanced CCL configuration is called override control. In this set- 
ting, one control function manipulates one variable during normal operation, 
however, a second control function can take over during abnormal operation to 
prevent some safety, process, or equipment limit from being exceeded [32]. The 
notion of normal and abnormal operation is dependent on the physical process 
under control. The variable under control by two or more control functions is 
typically used to manipulate one actuator or calculated setpoint. Figure 2b shows 
a graphical representation of the override control technique. 

It is also common to find setpoints and sensors shared between two or more 
control functions, as shown in Fig. 2c and Fig. 2d, respectively. Shared setpoints 
provide a convenient centralized configuration for multiple control functions at 
once. The motivation behind shared sensors is similar to that of shared setpoints, 
which in addition reflects the fact that typically there are limited instances of 
physical sensors in the system under control. An arbitrary number of the CCL 
configurations shown in Fig. 2 can be used to control ICSs [29]. 
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(a) Cascade control. (b) Override control. 
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(c) Shared setpoint. (d) Shared sensor. 


Fig. 2. Advanced CCL configurations. Color code: gray: static setpoint (ss), blue: con- 
trolling function (f), red: actuator (a), purple: sensor (se), and yellow: calculated set- 
point (cs). (Color figure online) 


2.2 Process Knowledge Data Sources 


Process comprehension is the main challenge attackers need to overcome to 
design successful and efficient attack strategies. Based on their capabilities, 
attackers can access different data sources of the targeted system, from col- 
lecting network captures to full access to an operator’s workstation. Each data 
source provides a different view of the system. Depending on the attacker’s inter- 
ests, they could find some data sources more useful than others. We follow the 
data source taxonomy presented in [12] and describe the information that can 
be derived from them. The data sources explored in their work include: 


PLC Configuration. PLCs are one of the most valuable components of ICSs. As 
they run the control logic that governs the process, they contain all the mecha- 
nisms to manipulate the system’s physical properties. An attacker can learn the 
system’s control logic and understand what sensors and network messages could 
jeopardize the system more quickly. 


HMI/Workstation Configuration. HMIs are a valuable target for attackers 
because they typically run on Windows-based machines and have a comprehen- 
sive view of the system, as it was shown in the Ukraine power grid incident [18]. 
Attackers with access to HMIs have a larger view of the system than individual 
PLCs. Attackers with access to HMIs could infer a more general view of the 
system, such as the type of industry they control [28]. 


Historian Configuration. Historian data is also a valuable data source for attack- 
ers. As historians store previous traces of the system, they provide helpful infor- 
mation for attackers to understand the nominal behavior of the system. His- 
torians are particularly valuable for model learning and the design of stealthy 
attacks [19]. 


Network Traffic. Network flows describe how devices interact with each other. 
Attackers can sneak into the system’s network and passively learn patterns of 
the system’s behavior. Networks provide a source of realistic attack vectors that 
can drive the system into unsafe states [8,9]. 
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Piping and Instrumentation Diagram (P&ID). P&IDs show the functional rela- 
tionship between the main components of the system, including piping, instru- 
mentation and control devices. Attackers could learn valuable information about 
operational equipment and deduce which computing devices are worth being 
attacked. 


3 Identifying Near-Optimal Single-Shot Attacks 


3.1 System Model 


In this work, we focus on generic Industrial Control Systems (ICSs) implemented 
using a common programming pattern known as Closed Control Loop (CCL). 
The structure of CCLs is typically comprised of 4 components: a control func- 
tion, a setpoint, a sensor, and an actuator. Control functions run in high avail- 
ability embedded systems such as PLCs. Setpoints, sensors, and actuators are 
abstracted as variables that serve as inputs and outputs for control functions. 
The system might be distributed, meaning that the components of a single CCL 
reside on different PLCs but are interconnected through a communication net- 
work and exchange messages using standard ICS protocols (e.g., Modbus, Eth- 
erNet/IP, BACnet). 


3.2 Attacker Model 


Attacker Goal. We consider an attacker that aims to lead to process damage (e.g., 
an emergency process shutdown) as fast as possible (as with longer attacks, the 
likelihood of detection rises). 


Attacker Capabilities. To achieve this goal, the attacker will manipulate the val- 
ues reported by a specific sensor (which simplifies the analysis, consistent with 
related work [13,15], and also limits the discussion to attacks on single sensors). 
Exactly how the sensor is manipulated is out of scope. Prior work has demonstrated 
many ways to achieve this via physical-layer manipulations [30], traffic manipula- 
tions [15], or direct attacks on the PLCs or other hosts [1]. In prior work, a number 
of approaches to select values to spoof have been discussed. In this work we use the 
constant and minimum/maximum value attack from [7,31]. 


Attacker Knowledge. As obtaining information on the target process is costly 
(e.g., reconnaissance effort, bribes, industrial espionage), the attacker aims to 
minimize the type of information required to successfully run an attack. In par- 
ticular, we assume an attacker that only has access to an abstract P&ID. We 
will show later how from this diagram a CCL graph can be derived. 


Lack of Simulation. In particular, the attacker does not have access to detailed 
physical process simulation environments. This implies (combined with the lack 
of detailed knowledge on the attacked process) that the effect of the attacker’s 
manipulations cannot be reliably predicted in advance. 
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3.3 Research Questions and Challenges 


Given this system and attacker model we address the following research question: 


Given the set of all ICS sensors in the target system, how can the attacker 
identify the sensor to manipulate to achieve a near-optimal single-shot attack, 
with limited knowledge on the attacked process and system (i.e., a CCL graph)? 


Here we motivate the single-shot and near-optimal attack requirements in 
detail. 


Need for Single-Shot Attacks. While the system under attack does not use 
process-aware attack detection systems [31], the human operators will eventu- 
ally detect anomalous system conditions. Once anomalous system conditions 
are detected, a full-scale forensic investigation would be launched, which would 
remove the attacker’s ability to launch further attacks. This means that the 
attacker will have to ideally perform an efficient attack on their first try. We call 
such attacks single-shot attacks. 


Near-Optimal Attacks. Out of a large set of possible attacks within our attacker 
model, we expect there to be an attack that is optimal in the sense that its 
sensor manipulation leads to the fastest possible intended effect (i.e., shutdown 
of the process). But there might be other attacks that are nearly as fast. For the 
purpose of our evaluation, we introduce the concept of near-optimal attacks. In 
particular, when ranking all possible attacks by their expected shutdown time, 
we call the three best attacks near-optimal. We will also discuss alternative 
efficiency comparisons later in our discussion. 


Challenge. We note that prior ICS security research argues (or simply assumes) 
that detailed knowledge about the system under attack is a requirement for a 
successful compromise [12]. So the main challenge we will have to solve is to 
leverage the limited process knowledge (i.e., high-level CCL graph) to identify 
suitable sensors to attack—without using attacker-side simulations to predict the 
outcome of process manipulations. If a near-optimal attack is reliably conducted 
on the first shot (even with limited knowledge on the system), our solution is 
considered successful. 


3.4 Identifying Near-Optimal Single-Shot Attacks in CCL Graphs 


We model ICSs as graph data structures that abstract the configuration of their 
closed control loops (CCLs). The graph abstraction of a single ICS might consist 
of multiple subgraphs. There are 5 types of nodes in our graphs that match the 5 
components of CCLs: static setpoints, sensors, control functions, actuators, and 
calculated setpoints. 

Formally, an ICS is modeled as a directed graph G(V, Æ) where V is a 
nonempty set of vertices (or nodes) and EF is a set of edges. Every edge has 
exactly two vertices in V as endpoints. The direction of every edge e € E mod- 
els the way information flows in the graph. There are 5 partitions SS, SE, 
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F, A and CS in V, that segregate the 5 types of nodes (static setpoint, sen- 
sor, control function, actuator, and calculated setpoint, respectively), such that 
SSUSEUFUAUCS = V. It is worth noting that although we assume knowl- 
edge about the type of the nodes (e.g., sensor), we do not require further details 
about them (e.g., temperature sensor, pressure sensor, etc.). 

There are different options to create CCL graphs. It is possible to extract the 
CCL graph of ICSs from P&IDs. These diagrams show interconnected physical 
instruments and might contain information about the CCLs used in the ICS. 
Figure3 shows an example of a P&ID in Fig.3a, and its corresponding CCL 
graph abstraction in Fig. 3b. Moreover, there are other options to create CCL 
graphs. For instance, it is possible to create a CCL graph exclusively from net- 
work traffic in the BACnet protocol [11]. A use case from a real BACnet system 
is discussed and exemplified in Sect. 6. 


Prax 
<— pr 
e-6-0 ò 
(a) P&ID of a chemical plant. (b) Graph abstraction of the control 


system derived from the P&ID. 


Fig. 3. Piping and instrumentation diagram (P&ID) of an ICS and its corresponding 
graph abstraction. 


The proposed approach starts by searching for specific patterns in CCL 
graphs. These patterns relate to well understood weaknesses originally analyzed 
in IT systems. After that, a post-processing step is needed to filter out non- 
sensor targets. This is important since according to our attacker model only 
sensor nodes will be targeted. Finally, the attacker has to choose among the 
pre-selected sensors and has only one opportunity to compromise its integrity. 
In the next sections, we explain how to execute near-optimal single-shot attacks 
and exemplify the proposed approach using the system presented in Fig. 3. 


Pattern Matching. We propose to use the accumulated knowledge about soft- 
ware weaknesses in the IT domain and transfer it to the ICSs domain. In par- 
ticular, we leverage Mitre’s Common Weakness Enumeration (CWE) database, 
which specifies common IT weaknesses. While we later use several of those CWE 
patterns in our implementation, here we focus on two entries which proved par- 
ticularly useful experimentally. Our goal is to provide an intuitive comprehen- 
sion of the proposed approach without limiting its applicability to a specific 
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subset of weaknesses. More formally, we identify each pattern P; with an index 
i=1,...,n. Each pattern matching query on the graph returns a subset S; C V 
of matching nodes using pattern P;. 


CWE-1108: Excessive Reliance on Global Variables. “The code is structured in 
a way that relies too much on using or setting global variables throughout various 
points in the code, instead of preserving the associated information in a narrower, 
more local context.” [22]. 

Global variables are generally considered a bad software engineering practice. 
Their main disadvantage is that malicious or benign-but-buggy changes to them 
will propagate and possibly disrupt many parts of the code. Global variables can 
be observed in CCL graphs mainly due to shared setpoints and/or sensors. As 
explained in Sect. 2, these are typical ways to combine CCLs in ICSs (see e.g., 
Fig. 2c and Fig. 2d). 

A suitable algorithm to identify global variables in CCL graphs is the out- 
degree centrality. This algorithm assigns a score to each node by counting their 
number of outgoing edges. More formally, for every node v € V\F, the out- 
degree of v is denoted as dt(v). We explicitly disregard function nodes in set 
F since this particular weakness is exclusively about variables. We select as 
potential targets those nodes whose value d*(v) > 7, for a context dependent 
threshold 7. 


CWE-1109: Use of Same Variable for Multiple Purposes. “The code contains a 
callable, block, or other code element in which the same variable is used to control 
more than one unique task or store more than one instance of data.” [22]. 

Overloading a variable with multiple responsibilities might unnecessarily 
increase the complexity of the code around it. Such complexity becomes an 
indirect security issue since it can hide potential vulnerabilities. 

In control engineering, the usage of override controllers deliberately creates 
a pattern in which two or more control functions manipulate a single variable 
(see Sect. 2, Fig. 2b). The manipulated variable is commonly of type actuator or 
calculated setpoint. Due to the widespread implementation of override controllers 
in ICSs, it is possible to find this pattern in real CCL graphs. 

The automated identification of override controllers in a CCL graph can 
be done by computing the in-degree centrality of every node v € V\F. This 
algorithm assigns a score to each node by counting their number of incoming 
edges. As in the previous case, we explicitly disregard function nodes in set 
F since this particular weakness is exclusively about variables. We denote the 
number of incoming edges of a node as d~ (v). Thus, we look at nodes whose 
d~ (v) > T, where typically T = 2. 


Post-processing. According to our attacker model, it is a requirement to target 
nodes of type sensor only. Although some of the nodes that satisfy a weakness- 
related pattern might be of type sensor other node types could be chosen too. In 
fact, it is common to find setpoint nodes as global variables and actuator nodes 
as multi-purpose variables. 
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For each node v € Sj, we first check whether v € SE. If v is of type sensor 
then v is added to the result set R; C SE. Else, we search for every sensor 
u E€ SE, such that there is a path from u to v, and add u to the result set Ri. 
The pseudocode of this algorithm is listed in Algorithm 1. 


Algorithm 1. Pattern matching post-processing. 


input: set S; C V of nodes identified using Pi. 
output: set Ri C SE of candidate sensor targets according to P;. 
Ri =9 
for all v € S; do 
if v € SE then 
Ri = RiU {v} 
else 
for all u € SE do 
if exist_path(from:u, to:v) then 
Ri = Ri U {u} 
end if 
end for 
end if 
end for 


Single-Shot Attacks. After the pattern matching phase, the attacker has to 
choose one target from all the result sets obtained. To choose one target from 
UrP_, R;, we assign each sensor a score depending on how many result sets R; 
they occur in. The reasoning behind this scoring system is that the sensors that 
have been identified by more weakness-related patterns have a greater chance 
of attack success. This selection criteria creates a subset of targets T C U"_, Ri 
among which only one has to be selected. Without further insights about the 
targeted infrastructure besides the CCL graph, it is hard to provide a meaningful 
sensor selection strategy from T. However, we hypothesize that T (1) will be 
smaller that the original sensor set; and (2) will contain only sensors capable of 
causing near-optimal shutdown times (SDTs). A smaller subset to choose sensors 
from, gives the attacker a probabilistic advantage over, e.g., an attacker that has 
to choose a sensor from the whole sensor set SE. Moreover, since we hypothesize 
that all sensors in T are capable of causing near-optimal SDTs, a simple (e.g., 
random) selection strategy is suitable from the attacker’s perspective. 

To compromise the integrity of a sensor, the attacker overwrites legitimate 
sensor readings in such a way that all linked control functions will take the 
attacker’s desired value instead. This value is fixed throughout the whole period 
of the attack, however, the attacker is free to choose the value at will. 


3.5 Motivating Example 


Figure 3a shows the model of a chemical plant originally described in [25]. Four 
chemical components referred to as A, B, C, and D, are part of the process. The 
first three components are combined in the reactor to create the final product 
D. The goal of the software controlling the plant is to keep a stable high quality 
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production rate while minimizing the waste of raw material. There are four 
control functions that take as input the values coming from three sensors and 
four setpoints (1 calculated setpoint and 3 static setpoints). The output of the 
functions aims to control three valves in the plant. 

From a security perspective, there are several details of the plant that a 
potential attacker would like to know to execute an attack. For example, the 
types of sensors (i.e., pressure, temperature), the chemical reaction carried out 


by the plant (i.e., A+C = D), the maximum pressure supported by the reactor 
(3,000 kPa), etc. Detailed process knowledge has been used in previous works 
to exemplify simulated attacks against this chemical plant [7,13]. Our approach, 
however, assumes limited process knowledge. More precisely, we only assume 
access to the CCL graph of the targeted infrastructure (Fig. 3b), which lacks all 
of the previously mentioned details. 

Figure 3b depicts the CCL graph of the chemical plant, which can be easily 
derived from its piping and instrumentation diagram. This graph shows two 
kinds of CCL combinations: (1) a shared sensor ys between control functions 
Loop 4 and Loop 2; and (2) a cascade control in which control function Loop 4 
sets a calculated setpoint Fysp to control function Loop 1. 

The proposed approach looks for weakness patterns in the chemical plant’s 
CCL graph that could identify suitable sensors to target. The search for multi- 
purpose variables (functions excluded) aims at nodes whose in-degree is greater 
or equal than two; no results are produced in this case. The search for global 
variables (functions excluded) looks for nodes whose out-degree is greater than 
a predefined threshold 7. All setpoints have an out-degree of one which makes 
them unfit to be labeled as global variables. However, the three sensors y4, ys, 
and y7 have an out-degree of 1, 2, and 1, respectively. This small sensors sample 
shows an average out-degree of 1.33 with standard deviation of 0.58. Defining T 
as the sum of the average and one standard deviation (7 = 1.91) sets node ys as 
a potential target according to the proposed approach. The result set would have 
sensor ys as an ideal candidate to target in this particular infrastructure. This 
result is concordant with previous works which state that “[iJn general we found 
that the plant is very resilient to attacks on y7 and y4... If the plant operator 
only has enough budget to deploy advanced security mechanisms for one sensor 
(e.g., tamper resistance, or TPM chips), ys should be the priority” [7]. 


4 Implementation 


We implement the proposed approach on top of Neo4j version 4.1.2 [24]. Neo4j 
is a noSQL database engine specialized in graph data structures. Neo4j offers a 
natural way to store CCL graphs and a high level query language that allows to 
perform complex queries in just a few lines of code. 

We assume that the attacker already has access to the CCL graph of the 
targeted infrastructure and has persisted it in a Neo4j database. Without any 
further knowledge, the selection of the sensors to attack is done through a set 
of queries on the graph. Such queries must be written in Neo4j’s query language 
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called Cypher Query Language. Figure 4 depicts the overall process to obtain a 
subset R; of candidate sensor targets according to a pattern P;. 

Our implementation includes a pre-processing stage that pre-computes infor- 
mation required in the pattern matching phase. Since both weakness-related 
patterns discussed in Sect.3.4 are based on the in- and out-degree centrality 
algorithms, the pre-processing stage consists of queries that assign the in- and 
out-degree to every node in the graph. 

The pattern matching phase consists in finding nodes that satisfy a specific 
condition in the graph’s topology. These particular conditions pinpoint nodes 
of interest from the attacker’s perspective. For the sake of brevity, here we 
discuss two weaknesses, namely, global variables and multi-purpose variables. 
Global variables are nodes whose out-degree is greater than a context dependent 
threshold 7. In our implementation, we define 7 as the average plus one standard 
deviation of the out-degree of nodes segregated per type. 

The second pattern matching query looks for multi-purpose variables (e.g., 
override controllers). This pattern is simpler since we only need to find nodes 
with an in-degree greater or equal than 2. 

Lastly, the post-processing phase replaces non-sensor nodes found during the 
pattern matching phase, with sensor nodes that have a path to them, thus, 
influencing their behavior. This ensures that the list of targets for each weakness- 
related pattern is comprised exclusively of sensor nodes. 


d List of 
Neo4j targets 


Fig. 4. Implementation of the proposed approach. 


5 Experimental Evaluation 


We evaluate the proposed approach in a realistic industrial control system. We 
use this environment to perform experimental attacks against all relevant sensors 
to obtain a ground-truth about the severity of the attacks in terms plant avail- 
ability. Due to the large number of attacks required to obtain the ground-truth, 
we opted for a simulated environment where the attacks can be executed without 
safety concerns. The simulated plant is known as the Tennessee Eastman Plant 
and has been extensively used in previous cybersecurity research [7,13,15,16, 28]. 


5.1 Tennessee Eastman Plant 


The seminal 1993 paper by Downs and Vogel describes the Tennessee Eastman 
Plant (TEP) [10]. Their description includes, among other details, the expected 
input and output of the plant, each step of the process from start to end, and 
the hardware available to control the process. The control hardware includes 
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41 sensors and 12 actuators. Their paper describes 20 disturbances commonly 
found in real chemical plants. For instance, sticky valves, changes in chemical 
reaction kinetics, and random variations in the composition of input streams. 
Each disturbance has a unique numeric identifier in the range [1-20]. Finally, 
there are process operating constraints (e.g., maximum reactor temperature, 
maximum reactor pressure, etc.) that must be satisfied at all times or the plant 
shuts down. Downs and Vogel present the challenge of implementing a control 
strategy for the TEP. To ease engagement in the challenge, the authors provide 
software source code to simulate the core components of the TEP such as the 
sensors, actuators, disturbances, and process operating constraints, leaving space 
for the missing control strategy. 

Several authors have proposed control strategies for the TEP [17, 20,21, 26]. 
The main difference between control strategies are the robustness against exter- 
nal disturbances, the optimization objectives, and the mechanisms to set the 
production rate. The TEP challenge is considered an open-ended problem with- 
out a unique correct solution [20]. 

In this work, we perform an extensive analysis of two control strategies for the 
TEP in terms of plant availability upon sensor integrity attacks. The first strat- 
egy, proposed by Larsson et al. [17], is available in the MATLAB/Simulink envi- 
ronment [27]. The second strategy, proposed by Luyben et al., is available in the 
Fortran programming language [20]. We translated it to the MATLAB/Simulink 
environment and published it.” 


5.2 Experimental Attacks 


All simulations are executed on the MATLAB/Simulink environment. Specifi- 
cally, we use MATLAB version R2015a running on Windows 10. The simulations 
are configured to run for 72h under attack. However, some of the attacks cause 
violations in the process operating constraints. As a consequence, some simula- 
tions stop earlier than expected. We refer to the time elapsed since the beginning 
of the attack and until the simulation stops as the shutdown time (SDT). 

An experiment is a set of simulations using the same environmental condi- 
tions (i.e., disturbances) and attack strategy. In our experiments, we execute sim- 
ulations with and without disturbances. The disturbances considered are those 
in the range [1-13] and are executed one at a time. According to the original 
TEP paper, disturbances in the range [14-20] should be used in conjunction with 
other disturbances [10]. The combinatorial explosion of such constraint deters us 
from executing simulations with disturbances in the range [14-20]. The attack 
strategy is the way in which the attacker chooses the value used to compromise 
the integrity of sensors. We use three different attack strategies. First, assuming 
that the attacker does not have any knowledge about the targeted sensor, we 
choose the constant 127. This number is small enough to fit in 1 byte (signed 
int) which ensures that most industrial protocols will deliver the malicious value 


t Simulation and results available at https://gitlab.com/eastman_tennessee/larsson. 
? Simulation and results available at https: //gitlab.com/eastman_tennessee/luyben. 
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in a single packet, thus, executing a stealthy attack. For the second and third 
attack strategies, we assume that the attacker knows historic sensor readings, 
in which case we choose the minimum and maximum values observed per sen- 
sor. Although this additional knowledge is not required by our approach, we use 
these attack strategies to compare our results with previous works. 

After the experimental setup has been defined, we execute the attacks against 
each sensor, one at a time, and record its SDT (or 72h if no shutdown happens). 
If a simulation finishes at 72h, we assume that the attack does not cause a 
shutdown. The experiment finishes when all sensors used by the control strategy 
have been attacked. We rank the targets according to their SDT to identify the 
fastest SDT in the experiment and the sensor that causes it. We refer to them 
as the optimal SDT and the optimal target, respectively. In general, we refer to 
the top-3 fastest attacks as near-optimal attacks. 

To put our results in perspective, we compare the chances of achieving a 
near-optimal attack of an attacker using our approach (with the capabilities 
described in Sect.3.2) and an attacker who picks, uniformly at random, one 
sensor to target (we assume that both attackers use the same attack strategy). 
Hereafter, we refer to the latter simply as the random attacker. 


Control Strategy #1. The control strategy by Larsson et al. [17], uses only 
9 actuators and 16 sensors out of the 12 actuators and 41 sensors available in 
the TEP. Additionally, there are 20 control functions, 9 static setpoints, and 12 
calculated setpoints. The CCL graph of this control strategy is comprised of 66 
nodes divided in 3 subgraphs. An illustration of the graph is shown in Fig. 5. 

We use the queries detailed in Sect.4 to identify potential targets. First, 
we compute the in- and out-degree centrality for all the sensors, setpoints, and 
actuators. Then, we run the pattern matching queries that aim at finding global 
variables and multi-purpose variables. The first query identifies the calculated 
setpoint number 12 (located in the middle of the largest subgraph) as a global 
variable. Although there are 16 sensors used in this control strategy, the post- 
processing phase identifies that only one of them has a path to the global variable: 
sensor 17 (i.e., Ry = {17}). Such a path can be visually confirmed following the 
direction of the edges in Fig.5. No multi-purpose variables are identified in this 
control strategy (i.e., Ro = Ø) Thus, the final target set T = {17} contains only 
one sensor, which makes the target selection easier for a single-shot attack. 

We hypothesize that sensor 17 is a near-optimal target for control strategy 
#1. To test our hypothesis we execute 28 experiments comprised of 448 individ- 
ual simulations that account for 11,047.465 simulated hours (~1.26 years). For 
the first two experiments we use the constant value attack strategy. In one of the 
experiments we set ideal environmental conditions (no disturbance) and in the 
other experiment we enable disturbance #8. We choose disturbance #8 because 
previous works have used exclusively this disturbance for their experiments. The 
results of the first two experiments are shown in Table 2. Regardless of the envi- 
ronmental conditions, the results are consistent in the top half of the table with 
greater variations in the bottom half. In these two experimental settings, a ran- 
dom attacker would have 1/16 chances (6%) of choosing sensor 40 (which does 
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not cause a shutdown), and 3/16 (#18%) of choosing a near-optimal target. 
Considering only the 15 sensors that do cause a shutdown, a random attacker 
would achieve an average SDT of 2.66h without disturbance and 2.52h under 
disturbance #8. On the other hand, an attacker using our approach has to choose 
one sensor from T = {17}, thus, ensuring 100% chance of finding a near-optimal 
target with a SDT of 0.192h in both cases (#11 min) and a difference of about 
5 min behind the optimal target (sensor 9 with SDT of 0.101 h.). 

Previous works have used the minimum and maximum value attack strate- 
gies against this control strategy. Unlike previous works that have used only 
one environmental condition to execute their experiments (disturbance #8), we 
execute our experiments under 13 different environmental conditions (including 
disturbance #8) to gather data from more diverse scenarios and gain more con- 
fidence in our results. Disturbance #6 is excluded because it is not supported 
by this particular control strategy, which means that a shutdown happens even 
without any attacks [17]. Due to space constraints, we summarize our results per 
attack strategy, which shows the average SDT and standard deviation for each 
sensor among all the simulations. The results, detailed in Tables3 and 4, show 
that sensor 17 is a near-optimal target with an average SDT of 1.21 h and 1.07h, 
respectively. For the minimum value attack strategy, a random attacker would 
achieve an average SDT of 24.23h, whereas in the maximum value attack strat- 
egy an average SDT of 24.87 (excluding sensor 1). Finally, for both the minimum 
and maximum attack strategies, sensor 17 is ranked in the second position. This 
confirms that sensor 17 is a near-optimal target not only during specific plant 
conditions, but in many different situations. 


Table 2. Experiments using the constant 
value attack strategy under two different 
environmental conditions. Sensor 17, identi- 
fied as a near-optimal target, is highlighted. 


No disturbance Disturbance 8 
Sensor SDT (h) “ Sensor SDT (h) ^ 
9 0.101 9 -101 
14 0.181 14 1.181 
17 0.192 Er 0.192 
11 0.426 11 427 
8 0.431 8 0.430 
4 0.526 4 0.526 
31 0.557 31 .560 
12 0.569 12 -567 
3 1.604 3 1.607 
: Lor k val Fig. 5. CCL graph of control strategy 
15 2.126 15 2.052 i 
i T336 5 gdi #1. The color of each node represents its 
5 7.609 7 6.971 type, as described in Fig. 2. (Color figure 
10 8.182 10 7.249 online) 
7 8.297 1 9.691 
40 |72 (no shutdown) 40 |72 (no shutdown) 


Identifying Attacks on ICSs with Limited Knowledge 185 


Table 3. Summary of experiments con- Table 4. Summary of experiments con- 
sidering the minimum value attack strat- sidering the maximum value attack strat- 
egy under 13 different environmental egy under 13 different environmental 
conditions. Sensor 17, identified as a conditions. Sensor 17, identified as a 
near-optimal target, is highlighted. near-optimal target, is highlighted. 
Sensor | Avg. SDT (h) 4|Std. Deviation Sensor) Avg. SDT (h) 4 Std. Deviation 

4 0.79 0.67 9 0.57 0.19 

17 IoT 0.76 17 1.07 0.14 

9 2.06 0.73 4 1.27 0.21 

8 2.66 0.41 3 2.60 0.42 

3 4.11 0.35 8 2.83 0.46 

2 4.55 0.75 2 3.32 0.66 

7 7.70 2.19 12 5.44 1.36 

12 7.97 2.25 14 8.79 2.45 

14 8.94 2.39 15 11.10 4.36 

15 11.18 3.96 5 12.64 17.91 

5 14.53 17.49 31 56.78 27.62 

31 55.15 27.74 11 66.59 19.50 

11 66.64 19.31 10 66.64 19.34 

40 66.69 19.14 40 66.70 19.10 

10 66.73 19.02 T 66.73 19.00 

1 66.78 18.82 1 | 72 (no shutdown) 0.00 


Control Strategy #2. The second control strategy, proposed by Luyben et 
al. [20], uses 10 sensors and 10 actuators from those available in the plant. 
This control strategy requires 13 static setpoints, 13 control functions, and 1 
calculated setpoint. In total, the CCL graph is comprised of 47 nodes. 

As in the previous control strategy, we use the queries detailed in Sect. 4 
to identify sensor targets, starting with the computation of the in- and out- 
degree centrality for all the sensors, setpoints, and actuators. The query regard- 
ing global variables identifies sensors 8, 12, and 15 (i.e., Ri = {8,12,15}). The 
query regarding multi-purpose variables identifies actuators 1, 2, 7, and 11. The 
post-processing phase looks for sensor nodes that have a path to these actuators 
and identifies sensors 8, 12, 15, and 29 (i.e., Ro = {8,12,15,29}). As described 
in Sect. 3.4, the final targets subset T contains those sensors identified by most 
weakness-related patterns. In this particular case, T = {8, 12,15} because these 
sensors occur in both R, and Rə. Finally, the attacker selects one target t € T 
at random. 

We hypothesize that sensors 8, 12, and 15 are near-optimal targets against 
control strategy #2. To test our hypotheses we execute 30 experiments comprised 
of 300 individual simulations that account for 11,079.163 simulated hours (~1.26 
years). As in the previous control strategy, we begin with two experiments using 
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the constant value attack strategy under two different environmental conditions. 
The first experiment without any disturbance and the second experiment under 
disturbance #8. The results of the first two experiments, detailed in Table 5, show 
no significant differences between both plant conditions. In these environmental 
conditions, the random attacker has a 30% chance of choosing a near-optimal 
target (3 out of 10 sensors), and the same chances of choosing a sensor that do 
not cause a shutdown. On the other hand, an attacker using our approach has 
a 66% chance of finding a near-optimal target (2 out of 3 sensors) since sensors 
12 and 15 are near-optimal but not sensor 8. The random attacker can achieve 
an average SDT of 1.59h without disturbance and 1.51 h under disturbance #8 
(excluding the 3 sensors that do not cause a shutdown). On the other hand, an 
attacker using our approach achieves an average SDT of about 0.50h in both 
cases; more than 1h faster (Fig. 6). 

As for control strategy #1, we execute additional experiments using the 
minimum and maximum attack strategies. This time we use all disturbances 
in the range [1-13] because this control strategy is able to handle all of them. 
Thus, we execute two attack strategies over 14 environmental conditions (no 
disturbance + 13 disturbances), which adds up to 28 additional experiments. 
Again, due to space constraints, we summarize our results per attack strategy. 
Tables6 and 7 show the average SDT and standard deviation for each sensor 
attack throughout all the simulations. As in the first two experiments, for the 
minimum and maximum attacks, the random attacker has 30% chance to find 
a near-optimal target but also 30% chance of finding a target that does not 
cause a shutdown. For our attacker, however, the chances of finding a near- 
optimal target differ per attack strategy. The minimum value attack gives our 
attacker only 1 out of 3 chances of finding the near-optimal sensor 15 (33%); 
a marginal benefit with respect to the random attacker. However, our attacker 
does not have the 30% risk of a no-shutdown attack. After excluding the 3 


Table 5. Experiments using the constant 
value attack strategy under two different 
environmental conditions. The 3 sensors 


identified as near-optimal targets are high- 
lighted. 


2 
No disturbance Disturbance 8 ee tp 3 
A 


Sensor SDT (h) “ Sensor SDT (h) 3 
7 0.144 7 0.144 4 
15 0.161 15 0.161 
12 0.311 12 0.311 
9 0.440 9 0.440 > 
8 1.031 8 1.032 
it 2.486 Tt 2.647 Fig.6. CCL graph of control strategy 
23 a 23 3844 #2. The color of each node represents its 
18 |72 (no shutdown) 18 |72 (no shutdown) type, as described in Fig. 2. (Color figure 
29 |72 (no shutdown) 29 |72 (no shutdown) online) 
30 |72 (no shutdown) 30 |72 (no shutdown) 
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Table 6. Summary of experiments con- Table 7. Summary of experiments con- 
sidering the minimum value attack strat- sidering the maximum value attack strat- 
egy under 14 different environmental con- egy under 14 different environmental con- 
ditions. The 3 sensors identified as near- ditions. The 3 sensors identified as near- 
optimal targets are highlighted. optimal targets are highlighted. 
Sensor Id.| Avg. SDT (h) “ | Std. Deviation Sensor Id.| Avg SDT (h) “ | Std. Deviation 

15 0.65 0.20 12 0.69 0.16 

9 1.28 0.51 15 0.70 0.07 

7 1.72 0.79 9 1.09 0.40 

12 4.25 1.72 8 4.48 0.55 

8 8.29 198 23 52.01 29.46 

23 63.03 22.83 it 55.30 27.60 

30 66.01 16.82 29 67.46 16.97 

29 72 (no shutdown) 0.00 30 72 (no shutdown) 0.00 

11 72 (no shutdown) 0.00 11 72 (no shutdown) 0.00 

18 72 (no shutdown) 0.00 18 72 (no shutdown) 0.00 


sensors that do not cause a shutdown, the average SDT of the random attacker 
is 20.75h; significantly larger than the average SDT of 4.4h for our attacker. 
For the maximum value attack, our attacker has 2 out of 3 chances of picking a 
near-optimal target After excluding the 3 sensors that do not cause a shutdown, 
the average SDT of the random attacker is 25.96 h. On the contrary, an attacker 
using our approach achieves an average SDT of 1.96 h. 


6 Discussion 


CCL Graphs. The first challenge faced by graph-based studies is the creation of 
the graph itself. There are different possibilities to create CCL graphs like those 
used in this work. Piping and Instrumentation Diagrams (P&IDs) are a suitable 
alternative because ICS documentation can be obtained through a variety of 
illegal means (e.g., phishing, social engineering, bribes) or simply downloaded 
from public repositories [14]. The creation of a CCL graph from a P&ID could 
even be automated using diagram digitization techniques such as [5]. 

Another way to create CCL graphs leverages the rich semantics of some 
industrial communication protocols. A concrete example is the BACnet proto- 
col (ISO 16484-5), commonly used to automate diverse services in hospitals, 
airports, and other buildings [4]. In these environments, the CCL programming 
pattern is extensively used. For that reason, the BACnet protocol implements an 
application layer object called Loop, which eases the implementation of CCLs. 
This object contains properties that point to other BACnet objects abstract- 
ing CCL components such as sensors, setpoints, and actuators. Since BACnet 
objects are regularly exchanged through the network, it is possible to create 
CCL graphs in a fully automated way simply by sniffing the traffic [11]. We 
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used this method to create the CCL graph of a real BACnet system comprised 
of more that 20 buildings located at the University of Twente. We can confirm 
that this method is capable of creating large CCL graphs (4,771 nodes) just by 
passively listening the network traffic. In general, we note similar structures in 
the BACnet graphs and the TEP graphs, which suggests the possibility to apply 
the proposed approach to other systems besides ICSs. 


Additional CWE Weaknesses. To describe the proposed approach, we elaborate 
on two weaknesses from Mitre’s CWE database, in particular, global variables 
(CWE-1108) and multi-purpose variables (CWE-1109) [22]. We emphasize these 
two because they proved particularly useful to find weaknesses in both TEP 
implementations analyzed in our evaluation. Although it is not our goal to pro- 
vide an exhaustive list of software weaknesses that can be mapped to ICSs, here 
we discuss additional weaknesses that could be observed in CCL graphs. 

Circular dependencies (CWE-1047) happen when “/t/he software contains 
modules in which one module has references that cycle back to itself.” [22]. In 
ICSs, circular dependencies can occur, for example, through control functions 
that write their output to a calculated setpoint node that, in turn, is the input 
of another control function node and so on, until at some point the sequence 
of references return to the initial node. Although none of the CCL graphs ana- 
lyzed showed a circular dependency pattern, we found a similar structure in the 
BACnet system previously discussed. 

Deep nesting (CWE-1124) manifests in software that “contains a callable or 
other code grouping in which the nesting/branching is too deep.” [22]. The soft- 
ware implementation of ICSs might contain a deep nesting weakness whenever a 
long sequence of closed control loops are chained together. For example, several 
cascade controllers concatenated. In this setting, the precise definition of ‘long 
sequence’ is determined by a context dependent threshold. 


Countermeasures. The most promising way to thwart our described approach to 
identify near-optimal single-shot attacks would be by preventing the attacker’s 
ability to gain knowledge of the CCL graph representation of the targeted infras- 
tructure in the first place. However for many ICSs, P&IDs are publicly avail- 
able, which an attacker can use to generate a corresponding CCL graph as we 
described before. Therefore, it would be necessary to keep such P&IDs secret 
and to protect them from being leaked to attackers. We stress here that fur- 
ther protective measures might be needed to prevent an attacker from being 
able to generate the CCL graph through possibly other information such as, in 
the case of building automation systems, using the BACnet protocol, for which 
eavesdropping on the traffic is already sufficient to automatically generate CCL 
graphs [11]. 

Our approach relies on identifying specific weakness patterns in the CCL 
graph which can be matched with well-known software weaknesses in the IT 
domain (e.g., via Mitre’s CWE database). Therefore, if a CCL graph does not 
contain any of these matching weaknesses, then our approach would fail in iden- 
tifying near-optimal single-shot attacks with a better probability than by simply 
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picking any sensor from the set of all sensors at random. This means that if 
an attacker cannot be prevented from creating the CCL graph, then another 
way to thwart our method would be by ensuring that the CCL graph contains 
no weakness patterns. One option to achieve this would be by performing a 
re-engineering on the software controlling the ICS (as we saw, several control 
strategies are possible for ICSs such as the TEP). However, designing security- 
aware control strategies that also meet operational and economic requirements is 
cumbersome [15] and completely avoiding certain weakness patterns could even 
be impossible in certain settings. 

If all of the above fails and the attacker is able to perform the identified 
near-optimal single-shot attack, then an operator can try to detect the attack 
as soon as possible by using monitoring systems to detect manipulated sensor 
values [7]. However, the detection time of such systems can be in the order of 
hours, which would be too slow to detect near-optimal attacks which, depending 
on the concrete case, can potentially bring down the ICS within minutes. An 
idea to improve the detection would be to tighten the monitoring specifically at 
those components in the ICS that our CCL-graph based attack method matches 
with known software weaknesses. 


7 Related Work 


Several researchers have analyzed different security aspects of the TEP. Here we 
focus in a subset of those works that have studied diverse attacks against the 
TEP. In [15], the goal was to get insights on the resilience of the physical process 
under attack. Similar to our attacker model, they focused on compromising sen- 
sors and identifying those in best capacity to cause a shutdown on the targeted 
infrastructure. In this particular case, the targeted infrastructure was Larsson’s 
implementation of the TEP, described in our evaluation as control strategy #1. 
Aligned with our results, they found out that sensors 4, 9, and 17 are the best 
targets under the minimum and maximum value attack strategies. However, they 
reached that conclusion by using much more knowledge than our approach, i.e., 
a fully-fledged simulation of the targeted infrastructure. For an attacker to be 
able to build an accurate simulation of the targeted infrastructure, he would 
need access to at least, full PLC(s) configuration, P&IDs, and documentation 
about the system’s constraints. 

In another work, the goal was to find out the right time to launch an attack on 
individual sensor signals to cause a shutdown of the targeted infrastructure [16]. 
In this case, again, the target was Larsson’s implementation of the TEP. The 
authors focused on DoS attacks on sensor signals, which forces control functions 
to use a stale value from the sensor. Under the assumption that launching sensor 
attacks at minimum or maximum peaks is the fastest way to cause a shutdown 
of the plant, the goal was to identify such peaks in real time. They approached 
this challenge using the Best Choice Problem (BCP) methodological framework. 
Using different learning windows, they identified sensors 4, 9, and 17 as the 
best candidate targets (i.e., fastest SDT). To execute this approach a potential 
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attacker might need at least network traffic access, a notion about the physical 
process and its potential disturbances (e.g., from P&IDs), and ideally, some 
historic data. Moreover, this approach has three main disadvantages besides the 
increased attacker knowledge about the plant. First, it is focused on identifying 
the right time to launch an attack but does not indicate the optimal target. 
Launching an attack on the wrong target might not cause any impact whatsoever 
on the plant. Only after executing attacks on all sensors it becomes clear which 
of them are optimal, which is unrealistic. Second, the time to attack is lengthy 
(this includes the learning window and the selection of the moment to launch 
the attack), ranging from 9.61h to 27.17h in their experiments. Third, in some 
circumstances their approach might not reach a conclusion, in which case the 
attacker “has to choose a clearly suboptimal candidate (last sample in the attack 
window) or decide to not launch an attack.” [16]. 

The works in [7] and [13], analyze diverse attacks against a simplified ver- 
sion of the TEP. Both works incorporate detailed knowledge from the process 
dynamics to execute optimal attacks. An attacker leveraging the techniques pro- 
posed in these 2 works would require at least access to the PLC(s) configuration, 
historic data, and documentation about the system’s constraints. We deem the 
simplified version of the TEP so small that we use it only in our motivating 
example (Sect.3.5). However, as we showed, our approach is also applicable to 
this plant and our results match the results obtained by previous works, but 
using limited knowledge. 


8 Conclusion 


In this work, we investigated if constrained attackers without detailed sys- 
tem knowledge and simulators can identify near-optimal attacks. In contrast 
to attacks in prior work (that require precise algorithms, operating parameters, 
process models or simulators), in our approach the attacker only requires abstract 
knowledge on general information flow in the plant. Based on that information, 
we construct a CCL graph, and apply graph-based pattern matching based on 
several weakness patterns from the CWE database. 

Our resulting approach provides us with one (or more) sensors to attack. 
Experimentally, we applied and validated our approach on two use cases, and 
demonstrated that the approach successfully generates single-shot attacks, i.e., 
near-optimal attacks that are reliably shutting down a system on the first try. 

Our positive results in finding near-optimal targets with limited knowledge 
suggest that the difficulty of preparing ICS attacks is lower than previously 
thought. We not only showed that the graph analysis can be automated, but that 
the graph creation can be automated too (e.g., using BACnet network traffic). 
This significantly lowers the bar for ICS attacks and calls for a reassessment of 
the actual security risks in ICSs. 
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Abstract. The seminal work of Heninger and Shacham (Crypto 2009) 
demonstrated a method for reconstructing secret RSA keys from partial 
information of the key components. In this paper we further investi- 
gate this approach but apply it to a different context that appears in 
some side-channel attacks. We assume a fixed-window exponentiation 
algorithm that leaks the equivalence between digits, without leaking the 
value of the digits themselves. 

We explain how to exploit the side-channel information with the 
Heninger-Shacham algorithm. To analyse the complexity of the app- 
roach, we model the attack as a Markov process and experimentally 
validate the accuracy of the model. Our model shows that the attack is 
feasible in the commonly used case where the window size is 5. 


1 Introduction 


One of the roles of a cryptographer is to ensure that implementations of 
cryptographic primitives are secure. In recent decades, side-channel attacks 
have been identified as a major threat to the security of cryptographic imple- 
mentations. These attacks observe the effects that executing implementation 
of a cryptographic primitive has on the environment in which it executes. 
Such effects include the power the device consumes [18,19], its electromagnetic 
emissions [10,32], timing [1,5,29], micro-architectural components [11,23], and 
even acoustic and photonic emanations [12,20]. By measuring these effects, an 
attacker can obtain information on the internal state of the cryptographic algo- 
rithm, which can lead to compromising the security of the primitive. 

In many cases, there is a gap between the information obtained through the 
side channel and secret information, such as plaintexts or keys, which the attacker 
may wish to recover [8]. Techniques to bridge this gap have been developed for 
multiple cryptographic schemes [3,6,7, 16,24, 25]. 

For RSA [33], in many cases the side-channel information provides the private 
key directly, requiring no further analysis [18,31,35]. When only partial informa- 
tion on the private key is available, there are two main approaches for key recov- 
ery. The Coppersmith method factors the RSA public modulus N = pq given 


© Springer Nature Switzerland AG 2022 
G. Ateniese and D. Venturi (Eds.): ACNS 2022, LNCS 13269, pp. 193-211, 2022. 
https://doi.org/10.1007/978-3-031-09234-3_10 


194 C. Chuengsatiansup, A. Feutrill, R. Q. Sim, and Y. Yarom 


enough consecutive bits of the private prime p [7]. The Heninger-Shacham (HS) 
algorithm [16] exploits algebraic relationships among the two private primes p 
and q, the private exponent d, and the two partial private exponents dp and dy, 
which are used in some implementations of RSA. Past works have used the HS 
algorithm to correct errors when the attacker obtains a degraded version of the 
key [15, 16,30], to correct errors in side-channel information [15, 17,21, 26-28, 30], 
and to recover information that is not obtained through the side channel [2, 4, 36]. 

In this work we consider the case that an attacker obtains knowledge of digit 
equivalence of the partial private exponents dp and dg. Specifically, we assume 
that the exponents are represented as digits in radix 2” and that the attacker can 
find which digits of the representation are the same without knowing the values of 
the digits themselves. Past works showed that such information can be obtained 
through side-channel attacks on fixed-window implementations [13] and that sim- 
ilar information can be obtained for sliding window implementations [17, 22, 34]. 

A naive approach for recovering the key from the digit equivalence informa- 
tion is to brute force the values of each of the 2” digits. However, such approach 
requires testing 2”! combinations, or an expected complexity of 2“!/2. This com- 
plexity requires significant resources even for w = 4 and is prohibitive for the 
commonly used case of w = 5. Past works overcome this limitation by relying 
on additional information from the precomputation stage of the fixed-window 
and sliding window algorithms. Since hardening modular exponentiation against 
side-channel attacks requires additional resources, it may be tempting to harden 
the precomputation stage and rely on the complexity of recovering the key from 
the digit equivalence for side-channel protection. 


Our Contribution 

In this work we show how to apply the HS algorithm to the problem of recovering 
RSA private keys given digit equivalence. Specifically, we show how to use guesses 
for low significant digits to prune the search space of the HS algorithm when 
processing higher significant digits. 

To analyse the complexity of our algorithm, we develop a theoretical model 
based on Markov chains. We use the model to calculate the probability of success 
and the number of operations required to recover the RSA key. Using this model 
we show that for the case of w = 4, more than 99% of the keys can be broken 
with a search space of size 27°, well within the means of modestly resourced 
adversaries. For the common case of w = 5, the model predicts that 65% of the 
keys can be broken with a search space of 24°, which is within the means of well 
resources adversaries. 

We complement the theoretical analysis with concrete experiments, applying 
our algorithm to randomly generated RSA-2048 keys. We find that the model is 
highly accurate, correctly predicting the success and complexity of the attack. 
Specifically, for the case of w = 4, we can break 987 out of the 1000 keys we 
experiment, with a search space of 225. 
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2 Background 


2.1 RSA 


RSA [33] is a public key system that can be used for encryptions and for digital 
signatures. To generate an RSA key, Alice picks two random primes, p and q. 
The public key is (N, e), where N = pq, and e is chosen such that it is co-prime 
with y(N) = (p—1)(q-1). The private key is (p,q, d) where d = e~! mod y(N). 
Most modern implementations use e = 65537 = 216 + 1, and choose p and q to 
match the requirement. 

We use n = |log, N| +1 to denote the bit length of the public modulus N. 
We further assume that the bit length of p and q is n/2. 

To encrypt a message m, Bob calculates c = m° mod N. To decrypt, Alice 
calculates m = c? mod N. Signing a message m is done by calculating s = 
m? mod N, and the signature is verified by testing that m = s° mod N. 


CRT-RSA. Alice can reduce the complexity of the private key operations using 
the Chinese Remainder Theorem (CRT). Specifically, Alice precomputes the 
CRT-RSA private key (p, q, d, dp, dq, inv), where dp = d mod (p—1), dq = d mod 
(q— 1), and qiny = q7! mod p. To calculate c4 mod N, Alice then computes: 


d 
Mp = Mm™? mod p 
d 


Mq = m4 mod q 
h = dinu(Mp — Mq) mod p 
m = Mma + hq. 


2.2 Fixed-Window Exponentiation 


The fixed-window exponentiation algorithm, shown in Algorithm 1, calculates 
BE mod M. The algorithm, parameterised by a window size w, represents the 
exponent FE as a number in radix 2”. We use the notation Efi] to refer to the 
ith digit of Æ. That is, E is represented as a sequence of digits 0 < Eli] < 2°, 
such that E = > E[i]2”°. 

To perform the exponentiation, the algorithm first precomputes 2” values 
B; = B’ mod M. It then initialises an intermediate result r to 1 and proceeds to 
scan the exponent E digit by digit from the most significant to the least signifi- 
cant. For each digit Ei], the algorithm raises r to the power of 2” modulo M 
using squaring w times, each time reducing the result modulo M. It then multi- 
plies the result by the precomputed value Bry = BFE mod M, again reducing 


modulo M. At the end of the algorithm we have r = BE Zll?” mod M = 
BE mod M. 


2.3 Attacks on Fixed-Window Exponentiation 


While the fixed-window algorithm is fairly regular and does not use secret- 
dependent control flow, implementations may still leak information about the 
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Algorithm 1: Fixed-window exponentiation 
input : window size w, base B, modulo M, 
exponent E = Ð E[i]2”’ with 0 < Efi] < 2”. 
output: B? mod M 
//Precomputation 
Bo«1 
for j from 1 to 2” —1 do 
| Bj I= Bj-1 -Bmod M 

end 
//Exponentiation 
r1 
for i from |E| — 1 downto 0 do 

for j from 1 tow do 

| r r? mod M 

end 

rr- Bry mod M 
end 
return r 


digit being processed in each iteration. In some cases, the attacker can recover 
(some of) the bits of each digit Efi] [36]. However, a common leakage identifies 
digit equivalence, i.e. detecting when two digits Efi] and Ej] are the same with- 
out identifying the digits themselves [13,34]. Specifically, Genkin et al. [13] uses 
a cache attack [22] to detect victim access patterns to the same digit, and Wal- 
ter [34] exploits a differential power analysis [19] to identify repeating patterns 
in power traces. Several works recover similar information from sliding win- 
dow implementations of modular exponentiation [17,22]. All these works exploit 
leakage during the precomputation phase to recover the key. Specifically, when 
computing B;,,, Algorithm 1 uses B;. The order of precomputation is known, 
thus an attacker that identifies the use of B; in the exponentiation phase can tie 
to its use in the precomputation phase and recover the digit value. 


2.4 The Heninger-Shacham Algorithm 


The Heninger-Shacham (HS) algorithm [16] uses a branch-and-prune approach 
for recovering an RSA private key from partial information on the bits of the 
components of the private key. Specifically, let (N, e) be an RSA public key and 
(p, q, d, dp, dq) be components of the corresponding private key, such that N = pq 
is an n-bit RSA modulus with p and q primes, e = 216 +1 is the public exponent, 
d = e~' mod (p — 1)(q — 1) is the private exponent, and dp = d mod p — 1, 
dg = d mod q — 1 are the CRT-RSA private exponents. 
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As Heninger and Shacham [16] note, there exist k, kp, and kg with 0 < k < e 
such that 


N = pq 
ed = k(N-—p-—q+1)+4+1 
edp = ky(p—1) +1 


edq = kalq = 1) +1. 


Moreover, Inci et al. [17] show that 0 < kp, kq < e and that given kp we can find 
kq and vice versa. 

Let T(x) be the exponent of the largest power of two that divides x. We 
note that because e is odd, t(ed) = t(d), T(edp) = T(dp), and r(ed,) = 
t(dq). Heninger and Shacham [16] first show how to find dmod 27*)+?, 
dp mod 27(*»)+1, and dy mod 27‘*4)+!, They then define a slice of the private 
key as 

slice(i) = (pli, ali], dé + 7(k)], dpli + 7p), dali + 7(k4)]) 


where 7 indicates the bit index starting from the least significant bit. Therefore 
pli] is the ith bit of p and p[0] refers to the least significant bit of p. Finally, they 
show that if we have a partial solution (p’, q’, d’, dp, di,) for slice(0) to slice(i — 1), 
the following four congruences hold. 


pli] + qli] = (N —p'@’)[i] (mod 2) (1) 

dli + +(k)] + pli] + qli] = (k(N +1) +1 — k(p' +‘) — ed’)[it7(k)] (mod 2) (2) 
dpli + 7(kp)] + pli] = (kp(p' — 1) + 1 — edp)[i + 7(kp)] (mod 2) (3) 
dali + T(kq)] + qli] = (kg (q' — 1) + 1 —ed))[i + 7(kq)] (mod 2) (4) 


Note that because p and q are primes and by the definition of r(-), we have that 
slice(0) = (1,1, 1,1, 1). 

The HS algorithm has been proposed in the context of cold boot attacks [14], 
where most of the errors are that bits containing 1 may decay into 0. Further work 
has investigated the HS algorithm with unbalanced bidirectional errors [15,30]. 
The HS algorithm has been further applied in the context of side-channel attacks 
which can have noisy measurements [21,26—28]. The HS algorithm can be used 
to complete partial information obtained through cache attacks [2,4,36] 


2.5 Markov Chains 


This section introduces terminologies and relevant facts that we use in our anal- 
ysis. We start with the definition of a Markov chain. 


Definition 1. A discrete-time stochastic process {Xn}nez+ on a countable state 
space §2 is called a Markov chain if for every n 


Pr(Xn = En|Xn-1 =fn-1,---,X1= xı) z Pr(Xn = Ln|Xn—1 = a-i) 
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The intuition for the definition is that the history of a Markov chain is only 
considered through the current state, and the knowledge of the previous states 
has no impact on the movement to the next state. 

An additional property that we use in the modelling is time-homogeneity. 
This refers to the fact that the probabilities of transitioning to the next state, 
with knowledge of the current state, do no change over time. That is, 


Pr(Xn41 = 35|Xn = i) = Pr(Xn = j|Xn—1 =i). 


Using the property of time-homogeneity, we then define the probabilities of tran- 
sitioning between two states as 


Pij => Pr(Xn = j|Xn-1 = i): 


These can be generalised to the k-step transition probabilities by considering 
the probability of transitioning between states in k steps. That is, 


pi = Pr(Xnte = j|Xn =ô). 


The analysis of Markov chains is greatly simplified knowing these proper- 
ties. For example, we create a matrix P of the transition probabilities of each 
of the possible transitions where each entry of the matrix [P];,; = pij- This 
forms a stochastic matrix, where each row sums to 1, since each state has its 
own probability distribution. The advantage of creating this matrix is that we 
can easily compute the k-step transition probabilities by taking powers of the 
matrix P [9, Theorem 1.1]. Then we have that the k-step transition probabilities 
can be calculated as 

ph = Pee 
Therefore, describing a problem in this way enables us to convert a poten- 
tially computationally difficult problem of calculating the probabilities of moving 
between two states in k steps, from a combinatorial problem, whose complex- 
ity grows quickly in the number of steps, to a linear algebra problem of taking 
matrix powers. 


3 Attacker Model 


Recall that the aim is to recover the secret exponent E, where E represents dp 
and d4, used during the RSA exponentiation routine. The attacker knows that 
the victim performs the exponentiation using a fixed-window method whose 
width w is publicly known. 

We assume that the attacker can observe, via the side channel, the digit 
equivalence of the secret exponent. Note that the attacker does not know the 
values of those digits; the attacker only knows whether, for example, Efi] equals 
E[i] for i 4 j. 

Figure | illustrates digit equivalence for w = 4. The attacker does know that 
E[2] = E|4] = E[7] but does not know that they are 1010. Similarly, the 
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1010 0111 0001 1010 1100 1010 1100 0111 
E7] Eļ6e] Eb] £14) EB] £2] £Z] ZIO] 


Fig. 1. A visualisation of digit equivalence for exponent E with w = 4. 


attacker knows that £0] = E[6] but does not know their value. Moreover, the 
attacker also knows that E[0] 4 E[1] 4 £[2] 4 E[5]. 

Continuing with this example, w = 4 has 2” = 24 = 16 possible different 
values of the digits. This means that the naive approach of determining the 
digits requires 16! ~ 244. With well funded organisations, this attack is feasible. 
However, commonly used w is usually larger than this. 

Consider w = 5 as used in OpenSSL [36], there are a total of 2° = 32 different 
digit values. The naive approach would require 32! ~ 2118, rendering this attack 
infeasible, even for well funded organisations such as the NSA. 


4 Our Approach 


We apply the HS algorithm with a branch-and-bound strategy together with 
pruning from our knowledge of digit equivalence. Recall that the HS algorithm 
reconstructs the CRT-RSA private components by looking at slices of the private 
key (p,q, d, dp, dq). At every slice(z), the algorithm builds the key by satisfying 
the four congruence relations described in Eqs. (1) to (4). 


4.1 Algorithm Overview 


In addition to the HS algorithm, we take advantage of side-channel information 
regarding digit equivalence. This allows us to further prune the solution space by 
removing any solutions that do not agree with our knowledge of digit equivalence. 
As a consequence, we significantly reduce the solution space and can reconstruct 
the key for larger window widths. 

Our algorithm follows the HS algorithm and starts building the solution space 
from the least significant bit denoted by bit 0. When considering d, and d, (hence 
kp and ką), recall that slice(z) considers the bits d,[i + T(kp)] and dali + T(kq)]- 
This results in two scenarios. One is where T(kp) and 7(k,) are zero, which we 
denote as the aligned case. The other one is where T(k,) and (kg) are not zero, 
which we denote as the unaligned case. We begin our analysis with the aligned 
case. The unaligned case is discussed in Sect. 4.4 


4.2 Complexity Analysis of the Aligned Case 


Assume the RSA fixed-window exponentiation uses a window width w. This 
means that there are 2” different digits. Recall that we consider slice(z) and build 
bit i for p,q, d, dp and dq. Furthermore, recall that slice(0) is known. Because we 
assume T(k,) = T(kq) = 0, we know exactly one bit of the first (least significant) 
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digit of each of d, and dy. Consequently, for the first digit, there remain 2°71 
possible partial solutions. Because we do not know any further information about 
the first digit, we cannot prune any possible solution at this step. 

In the subsequent slices, there are two possibilities for pruning. 


1. The first case is where the value of the current digit of dp (or dq) is equivalent 
to one of the previously observed digits. We can compare the current partial 
solution at each bit slice of the current digit and reject the solutions that do 
not match the bits of the equivalent digit seen previously. 

2. The second case is where the value of the current digit is not equivalent to 
any of the previously observed digits. In this case, we can eliminate solutions 
where the value of its current digit equals a value of a previously seen digit. 


In our algorithm, we model the search space as a search tree. The starting 
partial solution at slice(0) is at the root. Each level of the tree is a slice of the par- 
tial solution. The tree width reflects the number of solutions kept after pruning 
that level. For the purpose of the statistical analysis we make two assumptions 
about the statistical distributions of digits in the keys. 


Assumption 1 (digit independence) 

No correlation between lower and higher significant key bits. That is, given the 
knowledge of lower significant bits observed in the past, we do not gain further 
information regarding higher significant bits to be explored in the future. 


Assumption 2 (key independence) 
No dependency between p and q, thus dp and dq. This means that the knowledge 
of dp (resp. dq) does not provide additional information to infer dy (resp. dp). 


We note that neither assumption hold in practice—Coppersmith [7] likely 
implies that Assumption 1 is invalid and Heninger and Shacham [16] invalidates 
Assumption 2. Hence, we only use them to facilitate the statistical analysis. The 
agreement between our model and the experiments indicates that violations of 
the assumptions do not result in significant statistical differences. We further 
note that any violation of these assumptions is likely to facilitate attacks on 
RSA. 

We now consider slice(z). For the aligned case, slice(?) contains bits dp[i] 
and d,[i], which fall in digits d,[|i/w|] and d,[|i/w|], respectively. The side- 
channel information regarding the digit equivalence of d,[|i/w|] and d,[|i/w]] 
is categorised into four possibilities 


(P1) Both d,[|¢/w]] and d,[|i/w|] have been seen; 

(P2) d,[|i/w|] has been seen, but d,[|i/w|] has not; 
(P3) da[li/w]] has been seen, but d,[|i/w|] has not; 
(P4) Neither d,[|é/w]] nor dafli/w]] has been seen. 


Figure 2 illustrates seen and unseen digits of dp and dq, in the aligned case, 
where w = 4. Each box represents a digit whose value is printed within the box 
along with colors used to represent its value. The bit positions are given below 
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0111 0001 1010 1101 1010 1101 
dp [5] dy [4] dp [3] dp [2] dp [1] d,[0] 
0101 0011 1000 1110 1000 0011 
d,[5] d,[4] dq [3] dq [2] dali] d, [0] 


Fig. 2. An example of seen and unseen digits in the aligned case, w = 4. 


the boxes. Recall that the attacker does not know the values of the digits; they 
only know the digit equivalence. Considering this scenario, where w = 4, there 
would be 23 solutions at the end of the first digit at slice(3). The subsequent digits 
for dp and dq are both unseen, corresponding to (P4), so pruning can only occur 
at the end of the digit at slice(7). Using the notation introduced previously, any 
solution where E[1] = E[0] in either dp or dq are pruned. Moving on to the next 
digit, we get scenario (P2). The digit d,[2] has been seen previously in d,[0] 
and thus pruning could occur at each slice(8) to slice(11), i.e. solutions where 
E[2] # E[0] for dp are pruned. Additional pruning could occur from the unseen 
digit of d,[2] at slice(11). The search continues on and the pruning at each slice 
depends on whether the current digit has been seen. 

Because the side-channel information only applies to full digits, i.e. groups 
of w bits, we only perform the pruning at a digit boundary. That is, we combine w 
steps of the HS algorithm. To simplify notation, we use y to refer to the digit 
number where bit i falls, i.e. y = |¢/w]. 

Let y, and z, be the numbers of unique digits that have been observed at 
dp [0], ... ,dp[y] and d,[0],..., daly], respectively. We now make the concept of 
a previously seen digit more concrete by saying that a digit d,[y] (resp. dall) 
has been seen before if y = yy—1 (resp. zy = Zy-1). 

Define two random variables Y, and Z, from the space {0,...,2” — 1} for 
the number of unique digits observed after reading y digits. Hence, the four pos- 
sibilities above, i.e., (P1)—(P4), correspond to the four possibilities in Eq. 5 for 
moving to observe the next slice. That is, given the previous value (yy—1, 2-1), 
we obtain the following probabilities: 


Pr ((¥y, Zy) = (Uys 2y) (V1 Zy-1) = (Yy-1, 27-1) 


_ Zaj , 
(G <a 
= 2” — Zy 
(4) ( ae :) if yy = Yy-1 and zy = 2zy-1 + 1 
E 2 — yn 4 Zy—1 (5) 
( T ( T ) if yy = Yy-1 + 1 and zy = 24-1 
ww —y NSW l 
(Se) ) tre mattinde sain 
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We derive these probabilities by utilising the independence of the two keys, dp 
and dg. Therefore, we consider each contribution to the probability separately 
and the contribution is either the proportion of seen digits or the proportion of 
unseen digits, depending on whether the current digit has been seen in the key 
stream. Therefore, we can calculate the probability of a particular key sequence 
by multiplying the probability of the individual components. In Sect. 4.3 we 
discuss the formulation of these probabilities into two independent Markov chains 
to model the key recovery. 

Note that the complexity of our attack depends on the size of the search 
space, i.e. the number of nodes in the search tree. Let W, be a random variable 
that denotes the search space (the number of possible candidate keys) or tree 
width after y digit steps. The change of the width at each step is defined as 
follows. 


= if yy = Yy—1 and zy = Sy 4 
ZA if yy = yy—1 and zy = zy-1 +1 
h if yy = yy-1 + 1 and zy = Zy—1 
(2% — yy) (2% — zy) 


if yy = Yy-1 + l and z = z_14+1 


Observe that the change in the width only depends on the number of unique 
digits that have been seen and the number of digits scanned, y. In other words, 
the width is not dependent upon the sequence of Y, or Z} but the value at y. 
Since we know that the first digit must be odd (due to being prime), the first 
bit must be one. This means that there are fewer possibilities for the first digit. 
Consequently, this gives a factor of 2”~! for the first width since we have one 
fewer binary choice for the first bit. Therefore, the width after reading in y digits 
from each of dp and dg is 


~~ Le -™ e-m. (6) 


As noted previously, the width is independent of the order of the sequence. The 
expression takes the product of the y, and z, numerators of the change in widths 
and the initial width 2”—!, then divides by the number of 2” for y digits scanned. 
Assume the threshold of 2°, we have 


gu-1 Yy Zy 
T (2 - m) |] (2% - m) 2 
m=1 m=1 


and therefore to exceed the threshold we need 


Yy my 


II (22 _ m) II (2% = m) > gt-1+(y- lw, 


m=0 m=0 
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4.3 Independent Markov Chains 


Recall the two assumptions in our analysis, namely, digit independence (i.e. 
previously observed digits do not determine unexplored digits) and key inde- 
pendence (i.e. knowing dp does not infer dq or vice versa). We use these prop- 
erties to create identical distributed Markov chains and analyse these chains 
operating on d, and d, independently. Each Markov chain has the state space 
Q = {0,...,2” — 1} whose transitions have two possibilities: 


1. Sample a digit that has previously been seen, or 
2. Sample a digit that has not been seen. 


Therefore, we define the probability transitions as 


Yy-1 with probability ni 
Yy 7 Qu _ 
Yy-1 +1 with probability — 


Using this formulation, we construct a probability transition matrix P. An 
example for w = 4 state chain is given below. Notice that the matrix has non-zero 
probabilities on the main diagonal and the diagonal above only. 


1 3 

1 A 0 0 

1 1 
0 =- = 0 

P= 2 2 
3 1 
0 0 i 1 
0 0 0 1 


Let Y, be a random variable for the number of unique digits observed at 
digit y. Thus, the Markov chain tracking the evolution of either dp or dg is 


I= {Y4}yef0,...,.2%-1}- 


Given the Markov chain structure, we can calculate the probabilities of having 
observed y unique digits, after observing y digits. 

Note that this Markov chain is time-homogeneous since the transition proba- 
bilities do not change with the number of digits that have been read. This allows 
the calculation of the probabilities at each number of digit y read as the yth 
power of the probability transition matrix. That is, 


Pr(¥y = y|Y1 = yı) = Pola 


since both d, and d4 are independent and identically distributed. 
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The probability transition matrix P and its powers completely determine the 
system. Note also that we consider the initial state of beginning the first digit 
with a single unique value. This means that we will always consider the top row 
of the matrix P for calculation of the relevant probabilities. 

To finalise this discussion, thanks to the independence assumptions, we can 
alculate the transition probabilities for random variables Y} and Z4, the number 
f unique digits observed by digit y of dp and dy respectively, 


Pr(Yy = l, Zy = m) = Pr(Y, = 1)Pr(Z, =m) = [P] y, P]. 


Si. 


Note that the index of the starting state is 1, as we consider the probability of 
moving from state 1 to j in y steps. 

Regarding computational complexity, the naive calculation seen previously 
required that at each step, w probabilities are calculated, thus resulting in 
the complexity O(w7). Utilising the Markov chain approach, the calculation 
is reduced to taking the y powers of P where each matrix multiplication is 
O((2”)?:37) = O(2?37). This means that our approach has the computational 
complexity of O(y2?3"). 


4.4 Unaligned Case 


As previously mentioned, many real examples do not begin scanning digits from 
the least significant bit, i.e. T(kp) and 7(k,) are not zero. Our analysis suggests 
modelling these offsets as independent geometric random variables. That is, each 
bit stream of dp and dg, the offset O has probability of occurring of 


——,o€ {0,1,...}. 


We calculate the change in width by taking the assumption that all bits 
before the offset are known and we retain knowledge of their values. This has 
no impact on the evolution of the Markov chain. Therefore, we can utilise the 
same probability transitions while adjusting the weight by the offset. As a result, 
given the offsets op and oq, we have that the width can be calculated as 


AME) He -miem. o 


m=1 m=1 


That is, we derive this expression from Eq. 6 and adjusting for the known bits, 
which are those before the offset. Note that this means that any offset provided 
will lower the width for the same digit y, and number of digits observed, yy and 
zy. Therefore, the case with no offsets for scanning digits will have the largest 
width, for identical keys. 

We calculate the expected width at each digit y by summing over widths 
of the offset Eq.7 and the probabilities of being in a state (y4, z4) from Eq. 5. 
Explicitly, this gives the expected width for a digit y of 

Poot wee a 


Bm E D EEP Phe ge oe ae LL) [I e-m. 


Uy SV Zy SY Op oq m=1 m=1 
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5 Results and Comparisons 


We theoretically and experimentally evaluate our approach of reconstructing 
RSA private keys given side-channel information of digit equivalence. For the 
former, we use our derived formulas to estimate the search space and success 
probability. For the latter, we run our algorithm on a high-performance cluster 
and observe the convergence to the solution. In both cases, we also set threshold 
on the space complexity. 


5.1 Theoretical Results 


For the theoretical evaluation, we consider the RSA fixed-window exponentiation 
with w = 4 and 5. For each w, we use three different thresholds corresponding 
to three computation budgets. These three thresholds are 2?°, 24° and 26° which 
represent resource-constrained attackers, well-funded organisations, and nation- 
state organisations such as the NSA. 

The results for w = 4 are shown in Fig. 3. This suggests that the algorithm can 
recover the majority of the key before reaching the lowest resource-constrained 
attackers threshold of 27°. To be more precise, 99.9% of the key can successfully 
be recovered. Further analysis shows that the keys that exceed the maximum 
width have many consecutive unique digits at the beginning of the key. Thus, 
this allows the width to grow much quicker than the ability to prune infeasible 
keys. 

The results for w = 5 is shown in Fig. 4. As expected, the percentiles, median 
and mean all increase as the window width increases from w = 4 to w = 5. 
Even though it becomes more challenging for resource-constrained attackers, it 
is feasible for well-funded organisations. 
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Search tree width 
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Fig. 3. Distribution of width seen at each slice for w = 4 for both the theoretical and 
experimental results. 
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Fig. 4. Distribution of width seen at each slice of the Markov chain model, w = 5. 


These plots give insight into how the key recovery behaves. Most of the typical 
behaviour of keys is contained within a relatively small band demonstrated by the 
middle 50%. The mean is higher than the 75th percentile, which highlights that 
the maxima tend to be much higher than the middle values. The influence of the 
unaligned keys lowers the width in general, as the resulting reduction in width 
is a power-of-two offset. This has a large impact in lowering the median and 
percentile ranges, since many combinations of unaligned keys still occur with 
high probability, while have large reductions to the size of the width. Table 1 
summarises the success probability and the threshold for window width w = 4 
and 5. 


Table 1. Success rate in reconstructing keys for different w and thresholds 


225 940 260 
w= 4 99.9% 100.0% 100.0% 
w=5 8.2% 64.8% 99.9% 


5.2 Experimental Results 


To demonstrate the practicality of our attack, we implement and run the attack 
on a high-performance cluster. We are interested in the behaviour of convergence 
to a solution before reaching the search tree width threshold which we set to 
2?5 (resource-constrained attackers). If the solution space exceeds this threshold 
width, the search is abandoned as it would be too computationally intensive to 
continue the search. 

The distribution of the search tree width at each bit slice for w = 3 and 4 are 
shown in Fig. 5 and Fig. 3 respectively. It shows the widths up to slice(85). The 
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widths at the subsequent layers fluctuate between 1 and 2. One of these solution 
is the key in which we want to recover. These results are generated with 1000 
samples with randomly generated secret values. The key is randomly generated 
with e = 65537 and with 2048-bit RSA modulus. 


=-=- Median 
— Average 
© MaxMin 
E Middle 50% 


Qu 


21 


Search tree width 


T T T T T T T T T T T T T T T T r 
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 
Tree depth (slice) 


Fig. 5. Distribution of width seen experimentally at each slice, w = 3. 


The success rate for w = 4 is 98.7%. From these results, we can observe 
that the search widths follow a general pattern of an exponential-like increase 
before an exponential-like decrease. This aligns with our understanding. Initially, 
building the key with no seen digits, we expect there to be a growing number 
of possibilities for the digit string. As we move further along the slices of the 
key, at some point, we would have enough information of the key to narrow the 
search space. 

Table 2 lists the values of the widths seen when w = 4 and threshold is 27°. 
Note that the maximum and minimum width seen in the experimental results is 
not the true maximum and minimum as it is affected by the attacker-imposed 
threshold. The search is abandoned when the tree width exceeds the threshold, 
so we would get that the tree width drops to zero when an experiment passes 
the threshold, and the maximum width captured will be the width right before 
the run was abandoned. The theoretical results also accounts for extremely low 
probability events; thus a higher theoretical maximum is expected. 
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Table 2. Comparison of the width seen when w = 4 and threshold 27° 


Theoretical Experimental 


bit index maximum bit index maximum 
mean 39 570 000 38 790 000 


min* 3 8 20 3 
25% 31 880 31 33 000 
median 31 9700 26 26 000 
75% 35 98 000 30 240 000 
max* 47 1 400 000 000 38 67 000 000 


* Expect different theoretical and experimental results. 
The experimental results stop after the search tree 
exceeds the threshold. 


Despite the differences of the success rate, 98.7% from experiments and 99.9% 
using the theoretical model, the general behaviour of the search tree is similar. 
We see this in Fig.3 and Table 2. Ignoring the minimum and maximum due 
to the difference in performing the search experimentally, the mean of the tree 
width reaches its peak around bit index 40 with roughly the same value in both 
theoretical and experimental results. The middle 50% range occurs roughly at 
bit index 32, although the ranges of this value is much lower in the theoretical 
results. Again, this could be due to the modelling accounting for extremely low 
probability events. 


6 Conclusions 


In this work we apply the Heninger-Shacham algorithm in a new context. We 
assume a side-channel adversary that can observe the equivalence of digits in 
the private exponents of CRT-RSA. We show how to apply the algorithm given 
such information and develop a theoretical model that allows us to analyse the 
complexity of the attack. The model shows that the attack is feasible for a 
suitably funded organisation with a window size of 5 bits. We further validate 
the model through experimentation with randomly chosen RSA keys. 

Our model assumes that the digit equivalence information is complete. A 
potential extension of this work is to evaluate cases where we have partial infor- 
mation. For example, when there are errors in the digit equivalence information 
or when we only know the class of the digits (e.g. the Hamming weight). The 
work could also be extended to consider cases that use sliding window exponen- 
tiation. 

The results presented here has, once again, made apparent the importance 
of using constant-time implementations against side-channel attacks. 
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Abstract. Trifork is a family of pseudo-random number generators 
described in 2010 by Orue et al. It is based on Lagged Fibonacci Gen- 
erators and has been claimed as cryptographically secure. In 2017 was 
presented a new family of lightweight pseudo-random number generators: 
Arrow. These generators are based on the same techniques as Trifork and 
designed to be light, fast and secure, so they can allow private commu- 
nication between resource-constrained devices. The authors based their 
choices of parameters on NIST standards on lightweight cryptography 
and claimed these pseudo-random number generators were of crypto- 
graphic strength. 

We present practical implemented algorithms that reconstruct the 
internal states of the Arrow generators for different parameters given 
in the original article. These algorithms enable us to predict all the fol- 
lowing outputs and recover the seed. These attacks are all based on a 
simple guess-and-determine approach which is efficient enough against 
these generators. 

We also present an implemented attack on Trifork, this time using 
lattice-based techniques. We show it cannot have more than 64 bits of 
security, hence it is not cryptographically secure. 


Keywords: Pseudo-random number generators - Guess-and- 
determine - Cryptanalysis - Lattices 


1 Introduction 


Randomness is a fundamental tool in cryptography. All key generation algo- 
rithms use randomness to generate the keys and it is used in several well-known 
cryptographic protocols such as DSA, ECDSA, Schnorr signature scheme, etc. 
A pseudo-random number generator (PRNG) is an efficient deterministic algo- 
rithm that stretches a small random seed into a longer pseudo-random sequence 
of numbers. It is an efficient way to create pseudo-randomness to be used in cryp- 
tography protocols. A PRNG used in a cryptographic protocol needs to produce 
a sequence of bits indistinguishable from “truly” random bits by efficient adver- 
saries or the whole protocol might become insecure. PRNGs of cryptographic 
strength exist, some of them have been approved by NIST [7]. 


© Springer Nature Switzerland AG 2022 
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Because of the miniaturization of components and the emergence of the Inter- 
net of Things, we face a new cryptographic challenge in which highly-constrained 
devices must wirelessly and securely communicate with one another. The stan- 
dardized available PRNGs do not fit into these constrained devices, this is the 
reason why we are looking for lighter PRNGs. In [9], NIST presented several 
generally-desired properties that they would use to evaluate the design of future 
lightweight cryptographic protocols. They strongly underline the fact that the 
security should be of at least 112 bits. 

The lagged Fibonacci generators (LFG) are a class of linear generators. A LFG 
is defined by four parameters: (r, s, N,m) and an initial internal state composed 
of r words of size N: (a_,,...,%-1). At step n, the internal state of the generator 
is (£n-r,---,£n—1). The generator computes £n as £n = Ln-r + Zn-s mod m, 
outputs x, and update its internal state to (anj_,41,---,Un). These generators are 
light and fast, as needed for lightweight cryptography, but highly insecure. They 
have poor statistical properties, which make them easily distinguishable from the 
uniform distribution, and they are easily predictable (as we can obtain the full 
internal state by clocking the generator enough times). 

The goal of Arrow, presented by Lopez et al. [11] was to use two of these LFGs 
to keep their lightweight properties by combining them in a way that would make 
the resulting PRNG more secure. To improve the security of these new PRNGs, 
the authors used two LFGs of different lengths and combined them using both 
modular arithmetic over Z/mZ and modular arithmetic over Z/2Z, as combining 
two moduli tends to break the linearity of the operations. The sequences gen- 
erated by Arrow pass successfully all the Marsaglia’s Diehard randomness tests 
suite and the randomness tests of NIST. The statistical randomness distribution 
of the outputs of Arrow has been studied further in [5], by Blanco et al. in 2019. 

The idea behind Arrow derives from an older family of PRNGs: Trifork. 
Trifork has been presented in 2010 by Orue et al. [12]. The generators in Trifork 
combine three Lagged Fibonacci Generators together, again mixing modular 
arithmetic over Z/mZ and over Z/2Z. They also use a Linear Congruential 
Generator to initialise their large internal states. These large internal states are 
the main reason Trifork is not suited for lightweight cryptography. These PRNGs 
have a key of 192 bits and a claimed security of 192 bits. 

The Linear Congruential Generators (LCG) are an other class of linear gen- 
erators. A LCG is defined by three (often) public parameters a, c, m and a secret 
seed zo. At step i > 0 the generator outputs zi = ax;_; + c mod m. These 
generators are well studied and generally not cryptographically secure. 


Contribution. We show that Arrow, even if it has good statistical properties, 
is insecure. We present several practical algorithms to attack different versions 
of Arrow presented in the original paper, using the same choice of parameters 
they made for their tests. These algorithms reconstruct the full internal state of 
the PRNG. This allows to predict the pseudo-random stream deterministically 
and clock the generator backwards. For those attacks we choose a “guess-and- 
determine” approach: some bits of the internal state are guessed; assuming the 
guesses are correct, some other information is computed; a consistency check 
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discards bad guesses early on; then candidate internal states are computed and 
fully tested. Unfortunately, our attack is not general and the choice of bits to 
guess depends on the parameters of the underlying FLGs. This is why we need a 
different algorithm for each version of Arrow we want to attack. We will attack 
three different versions of Arrow. 


words size, key length | Claimed security | attack complexity 
Arrow-I 8 bits 96(128) bits 96(128) bits 38 bits 
Arrow-II 16 bits |96(128) bits 96(128) bits 48 bits 
Arrow-III| WN bits 32N bits 32N bits 7x N bits 


For Arrow-I and Arrow-II, the key length is 128 bits but can be shortened to 
96 bits using an IV. For Arrow-III, the attack is practical on a laptop for N = 8. 

We also present an attack against Trifork. The generators in Trifork have 
keys of length 192 bits but we show they cannot have more than 64 bits of 
security. Even if the two families of generators are close, the strategies to attack 
them greatly differ. As the internal state of Trifork is composed of several words 
of size 64 bits, we cannot use a “guess-and-determine” approach. As this internal 
state is large, it cannot be directly initialized with the key. This is why a Linear 
Congruential Generator is used. The LCG will be the breach we will use to 
attack Trifork. We will guess a third of the key (64 bits) and use lattice-based 
techniques to recover the rest of the seed. All the codes are available on my 
personal website. 


Related Work. Linear Congruential Generators (LCGs) have been largely studied 
through the years. The main attack against them was presented by Frieze et al. 
in 1984 [8]. In this attack, a Euclidean lattice related to the public parameters 
is built. Then outputs of the LCG are used to create a vector T1, not in the 
lattice but close to a vector T2 such that T2 is in the lattice and contains the 
seed of the generator. The lattice is reduced thanks to the LLL-algorithm -a 
polynomial-time reduction algorithm presented by Lenstra, Lenstra, and Lovasz 
in 1982- and its new basis is used to solve integer linear equations to retrieve 
the seed of the generator. In 1985, Knuth [10] studied a variant of the LCG: the 
secret LCG, where the usually public parameters are now secret. This variant 
was attacked by Stern in 1987 [14]. 

Guess-and-determine (GD) techniques are mainly used to attack stream 
ciphers. The stream cipher SOBER, was presented by Rose in 1998 [13]. In 1999, 
Bleichenbacher et al. presented a first GD attack against SOBER-II [6] and in 
2003 Baggage et al. presented another GD attack against SOBER-t32 [3]. Sev- 
eral generators from the NESSIE competition [1] (including SOBER-t32) have 
been attacked with a “guess-and-determine” approach. It is also the case for 
the cipher stream algorithms candidate in eSTREAM [2]. You can find a quick 
summary of other GD uses in this survey [4], paragraph 3.10. 
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Fig. 1. Description of Arrow 


2 Description of Arrow 


The lagged Fibonacci generators (LFG) are a class of linear generators. A LFG 
is defined by four parameters: (r, s, N, m) and an initial internal state composed 
of r words of size N: (£—p,...,£—1). At step n, the internal state of the generator 
is (Zn-r,..-,&n—1). Then it computes £n aS £n = Zn-r + Zn-s mod m, outputs 
£n and update its internal state to (£n-r+1,.- Zn). 

Arrow is a more elaborated architecture, its structure is described in Fig. 1. It 
is composed of two LFGs of respective parameters (11,81,N,m) and (r2, s2, N, m). 
The internal states of the first LFG are denoted (z;), the internal states of the 
second one (y;) and the outputs (w;). The values (2;)_,,<i<—1 and (yi)—r.<i<-1 
are the seed of this generator. The parameters 11,72, $1, 52, N,m are public. 

Instead of having £n = @n-s,+%n_r, mod mand yn = Yn—s—2+Yn—r. Mod m 
we scramble the two generators to obtain at step n > 0: 


En = ((Xn—r, D (Yn—so K d1)) + (@n—s, ® (Yn-r Ə d3)))modm (1) 
Yn = ((Yn—ry ® (In—s, K d2)) + (Yn—s, ® (En-r Ð da)))modm (2) 


where dj, d2, d3 and d4 are four public constant satisfying 0 < d; < N; @ is the 
bitwise exclusive-or; > and « are the right-shift and left-shift operators (as 
defined in C, not as rotations). The output at step n is: 


Wn = In ® Yn. 


The security of Arrow is based on the secrecy of the internal states. If we 
clock r2 times the generator, then for all i € {0,...,r2 — 1}, we know the value 
zi ® yi (which is equal to w;). This is the main weakness we are going to exploit 
in the following attacks. 
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3 Attacks on Arrow 


3.1 Simple Guess-and-Determine Attack on Arrow-II 


We present a first hardware version of Arrow (denoted Arrow-II in the intro- 
duction) with words of size N = 16 presented in the original paper. The set of 
parameters used is 


N m Ti sı r2 S2 dı = d2 = d3 = d4 
16 65536 |5/2/3]ļ]1 4 


and the claimed security is 128 bits (96 bits if a public IV is used). 
If we decide to split all the relevant words of size 16 into four sub-words of 4 
bits, we can represent the internal state of this variant of Arrow as follows: 


an bn:Cn dn : : en fn Inihn : ri) $ a?) : rP : a) 
in Ja kh bn Winn. On Pn y2 PO y2 yA 


We also split the outputs wp, of size 16 into four sub outputs of 4 bits: wP, 


we), wo) and w) with wh being the least significant bits of w» and w® the 
most significant bits. 


The Eq. (1) and (2) become: 


ao = dn + (hn ® kn) mod 16 (3) 

cD = (dy + (hn ® kn))div16 (4) 

£P) = (cn © pn) + (gn ® jn) + cD mod 16 (5) 
cO = ((cn © Pn) + (Gn ® jn) + cP )div16 (6) 
3) = (bn © On) + (fn © in) + c mod 16 (7) 
c8) = (by © On) + (fn Bin) + c@div16 (8) 
a4) = ((an © nn) + en + cÊ?) mod 16 (9) 
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(1) — La + (en ® pn) mod 16 (10) 

cl) = (Ln + (Cn ® Dn))div16 (11) 

y2 = (An ® kn) + (bn © On) + cP mod 16 (12) 
cl?) = ((hn ® kn) + (bn © On) + c®)div16 (13) 
YP = (Gn © jn) + (an @ Nn) + of mod 16 a 
cl?) = (gn ® jn) + (an ® nn) + c div16 (15) 
y) = ((fn B in) + Mn + c£) mod 16 (16) 

oD @ yD = w® (17) 

2) @ y® = wl) (18) 

2) py = w® (19) 

2) @ yl) = wh) (20) 


(i 


were “div” denotes the integer division. The cS) and cf) are the carries we 
must work with. Their value is either 0 or 1. The (w;) are known as they are the 


outputs. 


Our attack will be based on a classical “guess-and-determine” approach. The 
guessed bits will appear in red, the derived bits at the first step in blue, and the 
derived bits at the second step in olive. In this case, the attack is very simple: 
we start by clocking 3 times our generator. 


Step 1 


Step 2 


We guess az, b3, c3, ds, €3, f3, 93, hs, i3, j3, k3, €3 (hence 48 bits). With ds, 
h3 and k3 we compute rs. and cẸ (Eq. 3 and 4). Then we compute yP 


with af) and wh (Eq. 17) and retrieve p3 as we know £; and c3 (Eq. 10). 


The knowledge of c3 allows us to compute a?) (Eq. 5), recover y (Eq. 


18) and then o3 (Eq. 12). With o3 we can compute 2?) (Eq. 7), recover 
y (Eq. 19) and then ng (Eq. 14). And finally, with ng we can compute 
ao) (Eq. 9) and recover yd (Eq. 20) as well as m3 (Eq. 16). As we know 
Wo, W1, W2, we can fill up the internal states above 73, j3, k3, l3 and mg, 
n3, 03, pz and under e3, f3, g3, h3 (Eq. 17, 18, 19 and 20). 


We clock the generator twice. As explained above, we have derived as, 
bs, c5, d5 from i3, j3, k3, £3 and wo. The values es, fs, gs, hs are a), 
0), P, of) and i5, 75, k5, €5 are m3, ng, 03, p3. We remark that we 
are in a similar situation as step 1, hence we use the same equations to 


derive m5, 25, 05, ps as well as a), P, 2), al), yS?, yo? 


(4) 
Y5 - 
The values above ms, n5, 05, ps can be computed thanks to w4. 
At this point, we know the full internal state of the generator. 


f yo and 
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a3:b3.C3:d3 : |e ia ie: les fa gaha ix ie rO tO gP : 
: 2 ; Hi 3 2 3 H 3 £ 3 


is ja ka la|* xixi» manaosips| | yS :y® y y 


as'bs Cs ids|e3‘f3:g3iha| xixi xix lesi fs gsihs| xixixi] fo) |g i gO iM 
is :Js ks:ls|*:*: xx: x Ms5ns5:05:p5 yf” y yi” ys? 


Step 3 We compute the five following outputs using the internal states we have 
and we compare them with the true outputs given by the generator. If 
they are equal it means we have recovered the full internal state of the 
generator with overwhelming probability. We notice that the generator 
is easily invertible, hence we can recover the seed. 


This particular version of Arrow was supposed to have between 96 and 128 
bits of security (depending on whether or not an IV was used) and with this 
attack, we show it cannot have more than 48 bits of security which is far from 
the 112 bits of security recommended by NIST for lightweight cryptography. 
This attack had been implemented in C but is not practical on a standard 
laptop: a Dell Latitude 7400, running on Ubuntu 18.04 (the same laptop will be 
used for the rest of this paper). If we only test a hundred sets of guesses, the 
algorithm runs in 0.000144s. To retrieve the full internal state of the generator, 
the algorithm should run for approximately 12 years. 


3.2 Longer Guess-and-Determine Attack on Arrow-I 


Arrow-I is another hardware version of Arrow presented in the original paper, 
this time with words of size N = 8. The set of parameters used is 


N m rı | 81 | T2 | S2 dı = d2 = d3 = d4 
8 |256/9|)4/7)|3 4 


and the claimed security is 128 bits (96 bits if a public IV is used). 
If we decide to split all the relevant words of 8 bits into four sub-words of 4 


bits, we can represent the internal state of this variant of Arrow as follows: 


We also split the outputs w,, of 8 bits in two sub words of 4 bits: wP and 


wP , with wP being the least significant bits of w, and wP the most significant 


bits. 
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The Eqs. (1) and (2) become: 


= ame Inhan y2 yP 


et) = by + (en @ dn) mod 20/2 
Cz = (bn + (en @ dy,))div2"/? 

y® = fn + (an ® hn) mod 2N/? 

Cy = (fn + (dn ® hn) )div2"/? 

x) = (cn + (an @ hn) + Ce) mod 2N/? 
y? = Gn + (en ® dn) + ey) mod 27/2 


) 
) 
) 
) 
) 
) 


We start the attack by clocking the generator seven times. Then, for every 
n= 7, ens fn = = yo), yP, Cn, dn = a?) ,, ai and gn, hn = = y® yt 25, If we 
denote &;, fi the values above e;, fis we see that we can easily derive them from 
€i, fi and w;_7. We also denote gj, hi the values above gi, hi and ĉi, di the values 
under c;, di 


Step 0: guess b7, g7, (e7 ® dz), (a7 ® h7) 


determine — (aP y, fnyP, z? 7 c7) 


Step 1: bo = fr 
guess go, (€9 dg), (ag © hg) 
determine — (a, y, fos, ao), C9) 

Step 2: 61, = fo,c11 = a? ) dn = rP eu = g7 
guess fi1 
determine — (2P yP, 2, yP, gn) 

Step 3: a12 = C7, €12 = 911, €12 = ©, 912 = uP, hi2 = ys» 
guess b12, Cy, Cy 
determine — (2), y{, dia, 2), y{}), fiz) 

= 2 

Step 4: a15 = Jọ, C15 = Tiy , di5 = cfi ‚€15 = 911,915 = Me shiz = Yiz 
determine — (y, x L bis 2), y@ ) 

Step 5: aig = rP, big = ao) 
determine —> (xig „y his, v2, x0) 

Step 6: ais = P, bis = T9 ), 18 = oe fis = y, gis = y8 ); his = Yis 
determi dis, y, z? 

etermine —> (yig , £ s , dıs, Yig ; Tig Cis) 


a) 


a) (1) 


— (2) = 
C16 = £z , di6 = Tiz e16 = yf fie = ys 


(1) 
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Step T: a20 = P, boo = ah), c20 = P, dag = ol), €20 = gi6, feo = h16 
determine — (z9, yp, hao, 28), ys?) 920) 
Step 8: azı = zf), der = of), cor = g20,d21 = hao, €21 = yD, for = y, g2 = 
y3, ha = y9 
(1) ” (2) (2)) 


determine —> (Com Yo 1251 + Yor 
Step 9: a22 = Gis, b22 = hie, €22 = v9), d22 = rE ez = yP, fz22 = yl? 
determine — ($y), yys’, h22, £9% , Y$2 » 922) 

At the end of Step 9, we have derived from our guesses the whole internal 
state of the generator. We use these values to compute the five following outputs 
and compare them to the five “true” outputs given by the original generator to 
know if our guesses were correct or not with overwhelming probability. As we 
guess 16 bits in Step 0, 12 bits in Step 1, 4 bits in Step 2, and 6 bits in Step 
3, our time complexity will be approximately (238). We recall that the security 
of this generator was supposed to be of at least 96 bits. This attack had been 
implemented in C and is running in 20min over 8 threads and with the -03 
option on a standard laptop. 


3.3 An Attack Against Arrow-III, the Software Version of Arrow 


The software version of Arrow with words of size N is using the following set of 
parameters 


with N =8 or N = 32. 
If we decide to split all the relevant words of N bits into two sub-words of 
N/2 bits, we can represent the internal state of this variant of Arrow as follows: 


èna e lone, | yy 


We obtain the same equations as in the previous case. 
This version of Arrow has two specificities: 


— The values c;,d; are above g;,h;. Hence if the generator has been clocked 
enough times and if we know g; and h;, then we know c; and d;i. 
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— The two lagged Fibonacci generator used in this version of Arrows are more 
or less synchronized (which is something that should have been avoided). If 
we call t the difference between rı and rg, we notice that t = rı — r2 = S2 — T9. 
Hence, if we know Ci, fi Ci; di we will know Qi+14,; bi H14; Ci+14,; fi H14- It will 
ease our guess-and-determine attack; 


Because of that, in our attack we will only face three cases: 


Case gh We know a;i, bi, €i, fi, we guess gi, hi and derive c;, di, £i, yi with the 
help of w;_3 and w;. We compare a?) ® y” to wP. 
Case a We know ej, fi, gi, hi, we guess a; and derive xj, y; with the help of wi. 
We compare r” p y” to w®. 
u a 7 
Case 0 We know all the relevant values, we derive x;, yi from them and compare 


x; Ð yi to the output wi. 


We start by clocking the generator 17 times to know all the xor between x; 
and y; for i in {0,...,16}. 


Step 0: guess a17, €17, f17, 917,17 
(2) 0) , (2 )) 


(1) 
determine —> (c17, 417,017,247 5 Yi Yir 


assert zP @ viz = F w 


Step 1 (case gh): a31 = €17, 631 = f17, €31 = 917, f31 = haz 
guess 931, h31 
determine — (c31, d31, £31, 931) 


assert rP ® y = w 


Step 2 (case a): c34 = = 20), dsa = oh), esa = yÈ, fsa = y, 934 = 
2 
yf, h3a = ys? 
guess a34 


determine —> (£34, y34) 
assert s p ya = wi 
Step 3 (case gh): a45 = c17, bas = diz, €45 = gai, fas = hai 
guess g45, has 
determine — (c45, das, 45, Y45) 


assert rP @ ule = wi? 
Step 4 (case 0): aig = = x? bag = ot), cag = sÊ dis = = ao), e4g = 


ys? , Jas = ys)’, gas = wp, hag = ys 


determine — (ag, y4s) 
assert £48 © Yag = Wag 


In step 0, there are 2°%/? possibilities for the set of values {a17, €17, fiz, 917; 
hız}. Thanks to the first filter, on average only 24%/? possibilities are still on 
course for step 1. 

In step 1, there are 2°%/? possibilities for the set of values {a17, e17, fiz, 917; 
hiz, 931, h31} (240/2 from step 0 and 2?%/? for gs1, h31). Thanks to the filter, 
on average only 250/2 possibilities remains for step 2. 
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In step 2, there are 26N/2 possibilities for the set of values {a17, e17, fiz, 917; 
hiz, 931, h31, a34} (25N/2 from step 1 and 2%/? for a34). Thanks to the filter, on 
average only 250/2 possibilities remains for step 2. 

In step 3 we consider 25%/? x 22N/2 possibilities, on average only 2°%/? of 
them pass the filter. 


In step 4 we consider 2°/? possibilities, on average only 24%/? of them pass 
the filter. 
1 2 1 
Step 5 (case a): c51 = z8 ) dgı = ol, 651 = yo), fo. = ys, gs1 = 
2 

ye, hei = yia 
guess 51 
determine — (251, y51) 


assert P ® y£ = ws 


Step 6 (case gh): asg = c31, b59 = d31, €59 = gas, fs9 = has 
guess 959, hsg 
determine — (c59, ds9, £59, Y59) 
(2) 


Step 7 (case 0): ag. = zÊ „be2 = oh), C62 = ze, deo = ol), €62 = 
v, fe2 = y se 962 = s, he2 = ys 
determine — (£62, Ye2) 
assert £62 P Ye2 = W62 
Step 8 (case 0): ag5 = z8, bes = z9, C65 = a? ) joga = = 29, e65 = 
y8, fos = yf ) 965 = Ugo hes = ysy 
determine — (£65, Yes) 
assert T65 D Y65 = W65 
Step 9 (case a): ces = = 2, deg = oY), eses = y?), fes = y£? , ges = 
We maaa 
guess ags 
determine — (z£6s8, Yes) 
assert s ® yin) = = wl? 
Step 10 (case gh): a73 = c45, b73 = das, €73 = 959, f73 = hso 
guess 973, h73 
determine — (c73, d73, 273, Y73) 
assert a) ® ys?) = = wh 
Step 11 (case 0): ra = = 88 aa = ip cre = = a) dz = ot), er = 


1 
ys ) fre = ye +976 = ye ) he = = y% ) 


determine — (£76, Y76) 
assert £76 © Y76 = W76 
2 1 (2) 1 
Step 12 (case 0): a = a), bro = alt, C79 = zÊ) dro = r er = 


Vso ) fr = ys, g79 = = ys?) h79 = yt 


determine —> (x79, y79) 
assert £79 © Y79 = W79 
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Step 13 (case 0): e = = aibe = = ay Co = a), dgg = oes. = = 


ye), faz = ySs, g82 = yg), haa = yh 
determine — (£82, Ys2) 
assert £82 P ys2 = We2 


We keep repeating these three steps (case a, case gh, and case 0) until we 
reach n = 243. It takes another 110 steps to go there. At this point, we will have 
derived the full internal state of the generator and only one guess would have 
passed all the filters with overwhelming probability. This attack had been fully 
implemented in C. For N = 8 the attack is practical as it runs in 20s over 8 
threads on a standard laptop: a Dell Latitude 7400, running on Ubuntu 18.04. 

In each step, there are never more than 27%. /2 possibilities tested (the max- 
imum is in step 3). We can assume that the complexity is roughly 27% /2, For 
N = 8, we obtain 278, which is coherent with our experimental results. For 
N = 82, it would give 112 bits of security, which is enough for NIST’s standards, 
but far lower than the claim of 1024 bits of security. 


4 Description of Trifork 


Trifork’s structure is described in Fig. 2. 


Sd 


In-ry {= Tn-—sı = fen —1k 


DƏ d 
+ 
Yn-re Aa Yn-sz {e Yn- —p 


+ 
Zn—r3 {= Zn—s3 <= Zn—1 —(}) 


Fig. 2. Description of Trifork 


A Trifork generator is going to use three LFGs of respective parameters 
(71,51, N, m), (r2,52,N,m) and (r3,53,N,m). The internal states of the first 
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LFG are denoted (X;), the internal states of the second one (Y;), the ones of the 
third (Z;) and the outputs (w;). 

The seed of the generator is (X_,,,Y_,,,Z_,,). To fill its internal states, it 
will use a Linear Congruential Generator of public parameters a, c, m with a odd 
and where m is the same as the one used by the LFGs. 


For i € {-r, +1,...,-1}, Xi = aXi_-1 + c mod m 
For i € {—r2 +1,...,-1}, Y; = aYi_-1 + c mod m 
For i € {—r3 +1,...,-1}, Zi = aZ;_-1 + c mod m 


A step i, the generator computes 
X; = Xici + Kissi mod m 
Y’ = Yir + Yi mod m 
Zi = Zi-r3 + Zi—s mod m 


Xi = X; @ (Z; >> d) (27) 
Y; = Y/ @ (Xj > d) (28) 
Zi = Zl @ (Y'i > d) (29) 


where d is a constant satisfying 0 < d < N; @ is the bitwise exclusive-or and >> 
is the right-shift operator. The output at step n is: 


Wn = Xn @® Zn. 


The security of Trifork is based on the secrecy of the internal states. We will 
present an algorithm that retrieves X_,,,Y_,, and Z_,, in 264 steps. 


5 Attack on Trifork 


The reason this attack will use 264 steps is because we start by guessing a third 
of the seed: X_,, of length 64 bits. 


5.1 Recovering Z_,, 


We consider a parameter fı that will be the number of outputs we will use to 
recover Z_,,. We will set this parameter later. 

We denote by [X]q the d upper bits of a value, |X |a its d lower bits and 
consider the two following functions : 


izi 
git Soa! mod m and f : (r,s,7) > g(r — s + i) + gli) mod m 
j=0 
The first step is to compute an approximation of the d upper bits of the 
values {Xo,...,X y,-1}. If i < 0, X; = a(...a(aX_,, +c) + c...) +c mod m, that 
we conveniently rewrite X; = a™*'*X_,, + g(r1 + i) x cmod m. If i > 0, by Eq. 
(27), [Xila = [Xi-s, + Kiri mod mla. 
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— if i < s1, then [X;]q = fa’(1+a™~*)X_,, + f(ri, $1, 7) x c mod m]q and we 
can compute this value correctly. 

— ifi > sı, then [Xi]a ~ [Xi-—ri]at [Xi-s, la = [a'X_+, +g(i) xc mod m]at+ 
[X;—s,]q and we can only compute the d — (i — s1) upper bits correctly. 


With that we obtain an approximation of the d upper bits of {Zo,..., Zf,-1} 
knowing that Z; = W; 6 X;. We call these approximations Z;. 


— ifi < s3, then [Z;]q = [a’(1+a"~**)Z_,, + f(r3, 83,1) x c mod m]q. We set 
y= Z2"- 4 = flrs, 83, i) x c. 
e Ifi < sı then Z; = [Z;|q and at(1 + a™~*3)Z_,, — ti mod m = |Zi]n-a- 
Hence |a*(1 + a783) Z r, — ti| < 2°71, 
e Ifi > sı, Zi and [Zila are only equal on the d— (i— s1) upper bits. Hence 
lat (1 4 a753) Z_ r = t;| < gn—d+i-si 
— if i > s3, then [Z;]aq = [a’Z_,, + Zi_s, + g(t) x c mod m]q. We set t; = 
(Zi — Zaa) 2 4 = g(t) x C. 
elfi< S1 then Z; = [Zila and Zi—s; = [Zi-sg ld; so 


Zr, — ti = Zor — ([Zi]a — | Zines |a)2™ *— gli) x c mod m 
= Zin, — ([Zila — [Zi-sy]a)2”-* mod m 


([Zi-rgla + [Zi-s3la — [Zi-s3 + Zi-rg]a)2” * 
+ |Zi-rs |n—a mod m‘ 


II 


Hence |a’ Z r — ¢;| < 2070, 
e Ifi > sı, Z; and | Zila are only equal on the d— (i — s1) upper bits. Hence 
jai (1 + a73—83)Z_,, —t;| < 22-141, 


Remark 1. As we use few outputs we will not treat the case were i — r3 > 0. 
Case 1: If s3 > fi, we construct 
T = (Tiji<fı» 
which is close to 
(1 +a™7)Z_ n x (1,a,a7,...,a/71) mod m. 


We can see T as the outputs of a Truncated Linear Congruential Generator 
of seed (1+ a"~*8)Z_,, and known multiplier a. So we search for the closest 
vector to T in the lattice: {X x (1,a,a?,...,a/1~!) mod m|X € Z}. This 
lattice is spanned by the line of the following matrix: 


la aè... aft7! 
OmO ...0 
00 m...0 


00 0 ...m 
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The Closest Vector Problem (CVP) is finding, in a given lattice, the closest 
vector to a vector target T. This is usually a hard problem and we could have 
used the attack described in [8]. But here the matrix is of small dimension 
and we can solve exactly the CVP thanks to a CVP solver/We use the CVP 
solver of the fpylll library [15] for python. 

If fı is large enough the CVP solver returns 


(1+a™—*8)Z_,, x (1,a,a7,a°,...) mod m. 


We obtain (1+a™—*%)Z_,, but not Z_,, because (1+a™~*3) is not invertible 
mod m. 
Case 2: If s3 < fı, we set b = a7! mod mand a3 = (1+a"3~*3). We construct 


T = (ts3;--- th ty t0,- -< Ës3—1) 
which is close to 
C25. X aa s ati-1~83, b53 q3...,ba3) mod m. 
We search for the closest vector to T in the lattice: 


{X x (1,a,a?,... a717, b%8a3...,ba3) mod m|X € Z}. 


This lattice is spanned by the line of the following matrix: 
1 a ...afi—1~83|h%8 a3 b3Tlag ... bag 
0 m...0 0 0 0 
0 0 m 0 0 0 
0 0 0 m o0 0 


If fı is large enough the CVP solver returns 
aZ r, X (1,a,a?,... ,ati—1—83_ b3 ag... , bag) mod m 
and we compute Z_,,. 


The value fı is large enough when we have n bits of correct information. If 
n/d < sı, then we set fı = n/d+1 and the d—1 upper bits of the n/d+1 
computed approximation of X; are correct. If n/d > sı the we set fı such that 
fi—1x(d- fi- sy) >n. 

If we set X_,,, we compute Z_,, Or a3Z_,, by solving one CVP on a matrix 
of size fı x fi. 


5.2 Recovering Y_,., 


We consider a parameter fs that will be the number of outputs we will use to 
recover Y_,,. We will set this parameter as we set fı. If n/d < s3, then we set 
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fs =n/d+1 and the d— 1 upper bits of the n/d computed approximation of Z; 
are correct. If n/d > s3 the we set f3 such that f3 — 1 x (d— f3 — s3) > n. 

Firstly we will compute an approximation of the n—d upper bits of the values 
{Zo,.-+,Zfg—1}- 


— if i < s3, then [Z;]a = [a’(1+ a™~*3)Z_,, + f(r3, 83,7) x c mod m]q and we 
can compute this value correctly. 

— if i > s3, then [Z]a ~ [a’Z_,, + gli) x c mod m]a + [Zi-s, |a and only the 
d — (i — s3) upper bit are computed correctly. 


Remark 2. If we do not know Z_,, but only (1 + a™~*8)Z_,,, it means s3 > 
fı 2 n/d. So fg = n/d and s3 > fs and we never need Z_,. 


Secondly we will compute an approximation of the n — d lower bits of the 
values {X0,..., X f-1} 


— ifi < sı, then X; = (a (1+a™7®)X_r + f(r1, 81,7) x c mod m) @ (Z: >> d). 
— if i > sı, then X; = (a’X_,, + g(t) x c+ Xi_s, mod m) $ (Z; >> d). 


With the lower bits of the (X;) we can compute an approximation of the 
n — d lower bits of the values {Zo,..., Z,-1} knowing that Z; = W; ® X;. 

Then we obtain an approximation of the n — d upper bits of {Yo,..., Y,-1} 
knowing that Z; = (Zi-r, + Zi-s, mod m) ®(Y; >> d). We call these new values 
A 


Remark 3. When we computed the upper bits of (Zi), we only had the d upper 
bits, not the n — d. This lack of information impacts the rest of the calculation 
and at the final step, we know there is no information in the n — 2d lower bits 


of the (Y;). 


— if i < s2, then [Yila 
ti = Y;2¢ — f(r, 82, 

at Rea, Ee Ns A 
(Y; — Yi-s,)24 — g(i) x 


Here the dependences between the different values are harder to make explicit. 
For example, in the case where i < min(s1, 52, $3), we can compute the d upper 
bits of Z; correctly. Thank to that we can compute the d upper bits of |X; |n—a 
correctly.We obtain directly the d upper bits of |Z;|n—aq with Z; = W; ® X;. 
The last step is obtaining the d upper bits of Y; >> d. At this point there is an 
addition so we might loose one bit of precision because of a carry. We obtain 
that |a?(1 + a7?—*2)Y_,, — ti] < 22741, 


= [ai (1 + a"2782)Y_r, + f(r2, 82,7) X c mod m]q. We set 
i) xe. 
a= [a iY p, + Yi-s, + gli) x cmod mļa. We set t; = 


Case 1: If s2 > f3, we construct 
T = (Tidi< fs 


which is close to (1 + a"2~%?)Y_,, x (1,a,a7,...) mod m. We search 
for the closest vector to T in the lattice: 


{X x (1,a,a’,...) mod m|X € Z}. 


228 F. Martinez 


This lattice is spanned by the lines of the following matrix: 
2 


l aa 
Om oO... 
0 0m 


The CVP solver returns (1 + a"27°2)Y_,, x (1,a,a?,...) mod m. We 
cannot compute Y_,, because (1 + a’~*?) is not invertible mod m. 
Case 2: If s2 < f3, we set b= a~! mod m and az = (1+. a"~*?). We construct 


T = ee aii eat) 
which is close to 
aY rn X (1,a,a7,...,a73—1"*2, b*a..., bag) mod m. 
and we search for the closest vector to T in the lattice: 
{X x (1,a,a7,...,af—1782, b529... , baz) mod m|X € Z}. 
This lattice is spanned by the lines of the following matrix: 


1 a ... af8—1—82|b82 ay b82 lan ... baz 
0 m...0 0 0 0 


oo 
(= E ae) 
o3: 
3 © 
oo 
oo 


We CVP solver returns a° Y m, x (1,a,a7,...,a/—1!~*2, ba 
..., ba) mod m mod m and we can compute Y_,.. 


Once again, for a set X_,, we only solve one CVP to compute Y_,.. or a2Y_,,. 


Remark 4. We will not detail here how we recover Z_,, and/or Y_,, in the 
cases where we only have (1 + a"3~*8)Z_,, and/or (1 + a™~*2)Y_,, because it 
does not make appears interesting techniques. It only use modular arithmetic and 
does not need other guess or resource-consuming operation. 

This attack is fully implemented in sagemath but cannot run on a laptop as 
it needs to solve 264 x 2 CVPs. 
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Abstract. The guess-and-determine technique is one of the most widely 
used techniques in cryptanalysis to recover unknown variables in a given 
system of relations. A subset of the unknown variables is guessed such that 
the remaining unknowns can be deduced using the relations. Applications 
include state recovery for stream ciphers and key-bridging in key-recovery 
attacks on block ciphers. Since the attack complexity depends on the num- 
ber of guessed variables, it is essential to find small guess bases. 

In this paper, we present Autoguess, an easy-to-use tool to search for 
a minimal guess basis. We propose several new modeling techniques to 
harness SAT/SMT, MILP, and Gröbner basis solvers. We demonstrate 
their usefulness in guess-and-determine attacks on stream ciphers and 
block ciphers, as well as finding key-bridges for block ciphers. Moreover, 
integrating our CP models for the key-bridging technique into the pre- 
vious CP-based frameworks to search for distinguishers, we propose a 
unified and general CP model to find key-recovery-friendly distinguish- 
ers for both linear and nonlinear key schedules. 


Keywords: Guess & determine - CP - MILP - SAT - Grébner basis 


1 Introduction 


The practical security of symmetric-key cryptographic primitives with respect 
to known attacks is ensured by extensive cryptanalysis. There is a wide variety 
of different cryptanalytic techniques, including differential, linear, and integral 
cryptanalysis, and more. Many of these involve tracing the propagation of certain 
cryptographic properties at the bit-level, which can be highly nontrivial. From a 
designer’s perspective, designing a primitive requires the analysis with all these 
known techniques. Thus, the design and cryptanalysis of symmetric-key primi- 
tives is a time-consuming and error-prone process. Therefore, it is of significant 
importance for the community to develop automatic methods and tools. 

One of the most widely used techniques in cryptanalysis is the guess-and- 
determine (GD) technique, especially when only low amounts of data are avail- 
able to the attacker. GD recovers the unknown variables in a given system of 
relations on a set of variables: A subset of the unknown variables is guessed such 
© Springer Nature Switzerland AG 2022 
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that the remaining unknowns can be deduced using the information from the 
guessed variables. The correctness of the guesses also can be checked using the 
given relations since it is assumed that the incorrect guesses yield inconsistency. 

This approach can be used in various areas of cryptanalysis. For instance, 
it can be applied to recover the internal state of a stream cipher from a suffi- 
cient amount of output data, or the state and secret key of a block cipher from 
plaintext /ciphertext pairs. Another important application is the key-bridging 
technique in key recovery attacks on block ciphers, where the attacker aims to 
find the involved sub-keys based on the relations induced by the key schedule. 
Beyond these cryptanalytic uses, the GD technique also finds application in a 
broader mathematical context, for example in its links to uniquely restricted 
matching problems in graph theory [14]. In these applications, the complexity 
of the GD technique is directly dependent on the number of guessed variables. 
It is thus essential to find the smallest possible subset of guessed variables from 
which the remaining variables can be determined efficiently. 

In this paper, we provide a general tool to search for a suitable set of guessed 
variables with minimum size. This tool allows designers of symmetric-key prim- 
itives to easily and thoroughly analyze their designs from the GD attack point 
of view. Additionally, our tool can help designers to optimize their key schedule 
algorithms with respect to the key-bridging technique. 

Our contributions can be summarized as follows: 


1. We present Autoguess, an easy-to-use open-source tool which integrates a wide 
range of CP/SMT/SAT/MILP solvers as well as the Grébner basis algorithm 
to automate GD attacks and the key-bridging technique. Autoguess is publicly 
available at https://github.com/hadipourh/autoguess. 

2. We introduce new encodings in CP and SAT/SMT to formulate the GD 
attack which achieves a better performance compared to MILP encoding [3] 
in many cases, particularly when searching for feasible solutions. In contrast to 
previous models [3,7] where all variables should be deduced from the guessed 
variables, our reformulation takes an arbitrary subset of variables as the target 
variables into account. This enables us to extend the application to the key- 
bridging technique, where only an arbitrary subset of variables needs to be 
deduced. Additionally, we adapt the method introduced by Danner et al. [7] 
to translate GD attacks to the problem of computing the Gröbner basis of a 
Boolean ideal, and extend it for key-bridging technique as well. 

3. Using Autoguess to search for key-bridges in bit-oriented block ciphers with 
nonlinear keyschedule, we reduce the time complexity of the analysis phase 
in linear attack on 26-round PRESENT-80 presented at EUROCRYPT 2020 
[11] from 2°° to 264, In addition we show that our tool can automatically re- 
discover many of the best results obtained with the key-bridging technique, 
which previously had to be generated either manually or with dedicated, 
cipher-specific tools. For example, we successfully automatically re-discovered 
the integral attack on 24-round LBlock [6]. 

4. To show the application of our tool in the analysis of stream ciphers, we use 
it to reduce the computational complexity of the GD attack on ZUC from 2392 
[10] to 2390 while using the same amount of 9 keystream output words. 
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Table 1. Summary of our Attacks on SKINNY-128-256, SKINNY-64-192, SKINNY-64- 
128 and TWINE-80, where DS-MITM denote Demirci-Selguk Meet-in-the-Middle crypt- 
analysis and ST stands for single-tweakey setting. 


Cipher #Rounds Data Memory Time Attack Setting Reference 
SKINNY-128-256 19 296 CP 2210.99 9238.26 DS-MITM ST Sect. 9.1 
SKINNY-64-192 21 260 CP 2133.99 2186.63 DS-MITM ST Sect. 9.1 
SKINNY-64-128 18 232 CP 261.91 2126.32 DS-MITM ST Sect. 9.1 
TWINE-80 20 232 CP 262.91 276.92 DS-MITM - Sect. 9.2 
TWINE-80 20 232 CP 282.91 977.44 DS-MITM - [18] 


5. To show the versatility of our tool, we also used it for finding low-data- 
complexity attacks on block ciphers. More precisely, we used it to find GD 
attacks on AES, CRAFT, and SKINNY. For example concerning AES, we could 
rediscover the best previous GD attack on 3 rounds with data complexity of 
merely one known plaintext/ciphertext pair. 

6. We show that our CP-based approaches for the key-bridging technique are 
consistent with the previous CP-based frameworks to search for distinguish- 
ers. Hence, we integrate it into the previous CP-based frameworks for auto- 
matic search of distinguishers to build a general CP-model to find the key 
recovery friendly distinguishers taking the key-bridging into account for both 
linear and nonlinear key schedules. To show the usefulness of this new method, 
as it can be seen in Table 1 we could improve the memory complexity of the 
best previous DS-MITM attack on 20-rounds of TWINE-80 by a factor of 
220. We also utilized this new framework to find the DS-MITM attacks on 
SKINNY-64-128, SKINNY-64-192 and SKINNY-128-256 for the first time. 


Full Version. The full version of this paper [12] provides details on all applica- 
tions. 


Outline. In Sect.2, we recall the preliminaries on GD attacks and the key- 
bridging technique. In Sect.3, we propose the constraint programming model 
of these two techniques, and in Sect.4, we discuss an alternative model using 
Groébner bases. In Sect. 5, we introduce our tool Autoguess with its preprocessing 
and early-abort techniques. We apply it to find key bridges for different ciphers 
in Sect.6 as well as GD attacks on block ciphers in Sect.7 and stream ciphers 
in Sect. 8. Finally, we provide a discussion in Sect. 10. 


2 Preliminaries 


In this section, we provide a brief overview of the cryptanalytic background. 
We denote the integer range i to 7 by i ~ j. We use ~, A, V, © to denote 
bitwise NOT, AND, OR, XOR and || for concatenation. For a fixed wordsize, 
<< i, © i denote left rotation and right rotation by 7 bits, and H, H denote 
modular addition and subtraction, respectively. In SMT models, BVZExt(z,n) 
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is zero-extension of x by n bits as O||--- ||O||~, BVAdd (2, y) is bit-vector addition 
of z and y, and BVULE(z, y) is an unsigned < comparison of two bit-vectors x 
and y. In the GD context, we use x => y to indicate y can be deduced from z. 


2.1 Guess-and-Determine Technique 


The GD technique is a general method to solve a system of equations, given 
as a set of variables linked by relations. In this method, the values of a subset 
of the variables are guessed first. Next, using the relations, one may find the 
values of a subset of the remaining unknown variables, which is called knowledge 
propagation. If all of the remaining unknown variables are determined from the 
guessed variables, we call the set of guessed variables a guess basis [2]. 

For systems of equations obtained from cryptographic primitives, the GD 
technique is often used when data is very scarce, and statistical attacks are 
therefore impossible. The main challenges of a GD attack are to find a suitable 
guess basis and to effectively propagate knowledge. Since the complexity of GD 
attacks depends crucially on the size of the guess basis, the main goal is finding 
a guess basis of minimal size to addressing which several improvements and 
approaches have been proposed. For instance, Ahmadi and Ehglidos [1] proposed 
a heuristic approach based on dynamic programming to automatically find GD 
attack for classes of stream ciphers. 


2.2 Key-Bridging Technique 


Key-bridging is a technique to optimize the key-recovery process in attacks on 
block ciphers. In such attacks, a core distinguisher can often be extended by addi- 
tional initial and final key-recovery rounds, where an attacker guesses selected 
round key bits to verify the distinguisher. Key-bridging attempts to minimize 
the number of guessed key bits using dependencies in the key schedule. 

An interesting automated approach to search for key-bridging was introduced 
by Lin et al. at FSE 2016 [16]. However, their approach cannot handle certain 
cryptographic operations like modular addition and provides only a limited out- 
put: It only derives a bound on the number of solutions, but not the actual 
guess basis or determination flow. Moreover, their tool is based on a dedicated 
linear algebraic method and hence not consistent with the CP-based approaches 
to search for distinguishers, whereas as we will show in Sect.9, our CP-based 
approaches for key-bridging can be merged into the previous CP-based tools. 


2.3 Connection Relations 


We can describe the GD technique using two types of connection relations [3]. 


Definition 1 (Implication Relation). Let £o,...,£n—1,y denote some vari- 
ables. If y can be uniquely determined from xo,...,%,—1, we say they have an 
implication relation r and denote LHS(r) = {xo,...,%n—1} and RHS(r) = {y}: 


T : XQ,-.+,Un-1 => Y. 
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Table 2. Modeling a cipher using connection relations, where zo, £1 ...,£n—1, Y € FZ 
Equation or prerequisite Connection relation 
ZI 
Xor: y = Ql ai [0-19] 
=i 
And: y = Nio Ti £0, ...,Ln—-1 > Y 


1 


Modular Addition: y = Hj z; |[zo,..-,£n—-1,4] 
S-Box: y = F(x) with F : F} — F} | [x,y] 


Concatenation: x = zo|]... ||£n—=1 | £0; .--,£n-1 > z; YOLi<n-1:z> zi 
Elimination: If [x£o,..., £n—1, 2] A [£, yo,---,;Yn—1], then [z0,..., £E n—=1;, Y0,- --;Yn—1] 
Definition 2 (Symmetric relation). Let £o,...,£n—1 denote n variables. We 


say they have a symmetric relation r with |r| = n if and only if each variable x; 
can be uniquely deduced when the remaining n — 1 variables are all known: 


r: [£022 fn]: 


We can model a cipher using a combination of implication and symmetric 
relations by applying rules such as those illustrated in Table 2. 


2.4 A Naive Guess-and-Determine Approach 


Assume we have a system of connection relations involving n unknown variables 
and are looking for a guess basis of minimum size. A naive approach is exhaustive 
search, i.e., checking each possible subset of each possible size k, 1 < k < n, to 
discover a minimal subset that is a guess basis. To check whether a subset K of 
size k can be a guess basis, we assume that all variables in K are known and 
apply knowledge propagation through the given connection relations to update 
the set of known variables. A minimal guess basis is found as soon as a set 
of known variables is found which deduces all of the remaining variables. The 
complexity of the exhaustive search for a guess basis of size less than or equal 
to m (if it exists) is roughly Xz} Ci which is exponential in both n and m. 
Thus, this approach is infeasible when m or n are large enough. 


3 Constraint Programming for GD and Key-Bridging 


3.1 Modelling Knowledge Propagation 


Two main challenges of the GD technique are knowledge propagation and find- 
ing a minimal guess basis [2]. Finding a minimal guess basis is an optimization 
problem, but can be transformed into a sequence of decision problems whether 
a guess basis of a specified size exists. We thus need to model knowledge propa- 
gation. We consider variables which can either be unknown or known, and their 
state may change from unknown to known during the guessing sequence. 

Let (X,R) be a system of connection relations, where X = {2,...,%n—1} 
and R = {ro,.--;7m—1}. Assuming that a subset of variables such as Ko C X is 
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initially known, the known/unknown status of each single variable x € X in each 
step can be represented by a new binary decision variable, the state variable. 


Definition 3 (State variables). For a given system of connection relations 
(X, R), where X = {x0,...,Ln—1}, let the set of binary decision variables Sj = 
{X0,j-++,Un—1,;} represent the status of variables in the jth step of knowledge 
propagation, where zij = 1 if x; is known at step j and xij = 0 otherwise, for 
O<i<n-1 and j € Zp. 


For a given initial subset Ko C X of known variables, the knowledge propagation 
can be represented as a chain Sg > Sı > > Sj 

Given that a variable can be involved in more than one connection relation, 
we define path variables to link each variable to its corresponding relations. 


Definition 4 (Path variables). Let (X,R) be a system of connection relations 
with R = {ro,..-;%m—1}. Assume x; E€ X appears in À relations {r),...,7\_1}, 
where for each O < k < A—1, ri. is either a symmetric relation or an implication 
relation with x; € RHS(ri.). Then, for each step j of knowledge propagation, A 
new binary decision variables Path(xi j) := {£i jk :0 < k <A—1} are defined 
as 


$ xi can be determined from the relation ri, at step j — 1 
Ti j,k = 


0 otherwise, 


where 0 < i < n— 1 and j € Z>1. Path(z; j) is called the set of path variables 
corresponding to x; E X at the jth step of knowledge propagation. For j = 0 and 
all0 <i <n- 1, Path(zio) =Í. 


Proposition 1 (Knowledge propagation). Let (X,R) be a system of con- 
nection relations and Path(xi j) = {tijk : 0 < k < A—1} as defined above. 
Then xi; = 1 if and only if at least one of the following conditions holds: 


- Already known: x;,;-1 = 1, i.e., x; has been known since the previous steps, 
- Determined: There exists £i jp E Path(xi j) such that £i jk = 1, i.e., £i j 
can be determined from the previously known variables. 


For a given system of connection relations (X, R) and a subset of known vari- 
ables, any assignment for the state and path variables satisfying the definitions 
of state and path variables as well as Proposition 1 corresponds to a valid 
knowledge propagation. For a valid assignment of state and path variables, let 
K; := {xi,5 € S} : xij = 1}. According to the first condition in Proposition 1, if 
£i j = 1, then for all j’ > j, xij = 1, where 0 < i < n—1, since a variable remains 
known after it becomes known once. As a consequence, Ko > --: > Kj >, 
where j € Z>o is an ascending chain. On the other hand, the number of known 
variables is upper bounded by |X| = n. Therefore, according to the ascending 
chain condition, there exists a positive integer Ø such that Kg = Kg4i =+.. 
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While in the GD technique one usually looks for a minimal guess basis to 
deduce all of the remaining variables, in the key-bridging technique we are look- 
ing for a minimal set of guessed variables to determine a certain subset of vari- 
ables, which are the sub-keys involved in the key-recovery. Accordingly, we define 
the concept of guess basis for a subset 7 C X as the target variables. 


Definition 5 (Guess basis). Let (X,R) be a system of connection relations 
and T C X. The subset K C X is called a guess basis for T if there exists some 
positive integer B such that all variables in T can be deduced from K after B 
steps of knowledge propagation. 


Using the following proposition, one can characterize the guess basis for a 
given system of connection relations and a subset of target variables. 


Proposition 2 (Characterizing guess basis). Let (X,R) be a system of con- 
nection relations and So —> --- —> Sj — --- be its chain of state variables. 
Ko C X is a guess basis for T C X if there exists a positive integer B and an 
assignment of state and path variables for which the following conditions hold: 


— For all xio € So, if xi € Ko, then zio =1, and zio = 0 otherwise. 

— The assignment satisfies the definition of state and path variables and 
Proposition 1. 

— For all xip € Sg, if xi E T, then xig = 1, i.e., all target variables should be 
known in the final step of knowledge propagation. 


3.2 Encoding Using CP 


Proposition 3 (Link from path to state variables in CP encoding). 
Let £i jk be a path variable corresponding to the state variable x; j in connection 
relation ri. Assume that the variables of ri, are xi and xin, ... ,Vi,_, for some p € 
Z>,. Then, the link between Tijk, and the state variables ©, j-1,---,Xip_1,j—1 
is encoded as follows, where j E€ Z>1, and rj, is either a symmetric relation, or 
an implication relation such that x; € RHS(ri.): 


Ti j,k = Lig g-1 N***A %i,_1 5-1: 


Proposition 4 (Link from state to path variables in CP encoding). Let 
Path(2;,;) be the set of path variables {£i jk :0 < k < A—1} corresponding to 
the state variable xij. The link between z; j and Path(x;,;) can be encoded as 


Tij = Li,j-1 V Zijo Vtt V Zij- 


For a given system of connection relations (X, R), let 7 C X be the set of 
target variables for which we are looking for a minimal guess basis. To encode 
this problem into a CP model, we firstly consider a fixed positive integer value 
for @ as the depth of knowledge propagation and then generate the state and 
path variables corresponding to 8 steps of knowledge propagation. Assume that 
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the chain So — --- — Sg, represents the knowledge propagation through ( steps 
where S; = {xij :0 <i < n—1}. Then, we set the objective function as follows: 


: n-1 
min i20 05 


such that zig = 1 for all x; € 7, and all CP constraints linking the state and 
path variables (Propositions 3 and 4) are satisfied. If the constructed CP model 
is satisfiable, then the set M := {a; E€ X : zio = 1} is a guess basis for T. 
By the ascending chain rule, the set M converges to a minimal guess basis for 
T when @ is large enough. Algorithm 1 summarizes the CP encoding. With a 
similar approach, we show in the long version of our paper how to encode the 
search for guess bases in SMT/SAT languages. 


Algorithm 1: CP Encoding (Sect. 3.2) 


Input: System of connection relations (X, R), where X = {xo,...,@p—1}, a set of target 
variables 7 C X, the depth 6 € Zs, of knowledge propagation 
Output: A sufficient subset G C X for T 


1 Initialize a dictionary Deriver with keys(Deriver) = X and a CP model M; 
2 for i = 0 — n — 1 do 
3 Deriver[z;] — Hz: }; 
4 for r € R do 
5 if r is a symmetric relation and x; € r then 
6 Deriver[z;] — Deriver[x;] U Hv E€ r : v Æ xi}; 
7 if r is an implication relation and x; € RHS(r) then 
8 Deriver[x;] — Deriver[zx;] U [LHS(r)]; 
ə for j = 0 — 8 — 1 do 
10 for i = 0 —> n — 1 do 
11 M.var — {2i,j41}5 
12 A <— |Deriver[z,]|; 
13 for k = 0 — A—1do 
14 Let Deriver|z;][k] = {£i9>--- in hs 
15 M.var — {£i j+1,k} U {igj Bip njh 
16 M.con — Ti j+1,k = Ni=o e p—1 Tipi > Link path to state variables; 
17 M.con — Ti j+1 = Vk=0 ERE y-1 Vi,j+1,k > Link state to path variables; 
18 for x; E€ T do 
19 M.con + zig =1 > Target variables must be known in final step ; 
20 M.obj — min. yee Lio > Objective function; 
21 solution — M.solve > Call a CP solver; 


22 return G = {a; E€ X|xi,o = 1}; 


4 From Guess Basis to Grébner Basis 


In this section, we briefly recall the method introduced in [7] to translate the GD 
problem to the problem of computing the reduced Groébner basis of a Boolean 
polynomial ideal, and also modify it to take the target variables and known 
variables into account, allowing us to model the key-bridging problem as well. 


Given a system of connection relations (X, R), where X = {Xo,...,2n_1}, 
without loss of generality we can assume that all relations are implication rela- 
tions. For each implication relation 2;,,...,2i,, 5 => ®i,_,, we replace each 


variable x;, with a Boolean variable X;, representing whether it is known. This 
yields the logical formula (~X; V...V Xin V Xi,,_,)- Accordingly, a system 
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of connection relations is translated to a CNF formula. Let C be the derived 
CNF with n binary variables, and let Sat(C) be a subset of {0,1}” including all 
solutions of C. It is well-known that C can be represented via a set of Boolean 
polynomials F such that Sat(C) = Z(F), where Z(F) denotes the solution 
set of F. To do so, every Horn clause (~X; V...V 7X;,,_, V Xi,,_,) is trans- 
lated to a binomial 2;,---2;,,_, + (%i,,_, +1) in the Boolean polynomial ring 
F2[xo,.--,2n—1] 
(x2+209 v0? 4 +2n—1) 
lated to a set of Boolean binomials. The following theorem represents the rela- 
tion between a minimal guess basis for a system of connection relations and the 
reduced Groébner basis of its algebraic representation. While the CP approaches 
require specifying the depth of knowledge propagation, this is not necessary in 
the Grébner basis approach. 


. Thus, every system of connection relations can be trans- 


Proposition 5 (Link between a guess basis and Grébner basis [7]). Let 
(X,R) be a system of connection relations where X = {20,.--,2%n—i}, and 
K,T C X include the known and target variables respectively. Let F be the 


set of Boolean binomials in 7 y as the algebraic representation 


yt+En-1 
of (X,R). Besides, assume that o is a degree-compatible term ordering and J 
is the ideal generated by FU {k +1: for all k € K}. Next, compute the reduced 
a-Grébner basis of J + (t | for allt € T). Then every monomial xj, +++ Zin 
of smallest degree in this reduced Gröbner basis corresponds to a guess basis 
G = {Xip,..-,%i,,_,} of minimal length. 


5 Autoguess 


We developed Autoguess, an easy-to-use tool that implements these techniques 
to find GD attacks and key bridges. It receives a text file including the system of 
relations, target and known variables, and if applicable the depth of knowledge 
propagation as input, and outputs a guess basis of minimum size. Autoguess 
supports all encoding methods including CP, MILP, SMT, and SAT and allows 
the user to choose from many state-of-the-art solvers. It also supports the Gröb- 
ner encoding method, which has the advantage that the user does not need to 
specify the depth of knowledge propagation. The output of Autoguess not only 
represents the guessed variables but also includes the determination flow which 
illustrates how the target variables can be determined from the guessed vari- 
ables. Autoguess uses graphviz to generate a directed graph visualizing the 
determination flow. 

Figure 1 gives a high-level overview of the program flow in Autoguess. For 
Groébner bases, we use SageMath’s direct interface to efficient algorithms like 
PolyBoRi and Singular. For the CP model, we use MiniZinc as a CSP modeling 
language to support many state-of-the-art CP solvers like Or-Tools, Gecode, and 
Choco. For MILP encoding, Autoguess has a direct interface to Gurobi. For SMT, 
we apply PySMT to support many solvers, like Z3, CVC4, Boolector, MathSAT, 
and Yices. For direct access to SAT solvers, we use PySAT, which supports 
many modern SAT solvers like CaDiCaL, Lingeling, Minisat, and MapleSAT. 
Additionally, PySAT supports a variety of cardinality constraint encodings. 
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Fig. 1. The program flow of Autoguess 


5.1 Preprocessing Phase 


When translating a system of equations to connection relations, we only consider 
the connectivity relations between the variables and not the algebraic structure 
of the original system of equations. This is why neither the CP-based nor the 
Gröbner basis-based method can exploit such an algebraic structure. However, 
by taking the algebraic structure into account and adding new equations, we 
might be able to achieve a better result. Following the algebraic approaches to 
solve multivariate polynomial equations over finite field such as the Gröbner 
basis and XL algorithms [5,9,13], we can even derive further equations. To do 
so, we use reduced row echelon form of the degree-D Macaulay matrix which is 
defined as follows. 


Definition 6 |13]. For any integer k, let Tk be the set of monomials of degree 
smaller than or equal to k, in F2|£0,...,£n—1]. The degree-D Macaulay matrix of 
a system of equation F, denoted by Macp(F), is the matrix with coefficients in 
Fə whose columns are indexed by Tp and rows by the set {(u, fi) | i € [1; m], u € 
Tp—deg(f;)}, and whose coefficients are those of the products u fi in the basis Tp. 


According to our observations, using the new algebraic equations derived 
with the reduced Macaulay matrix can result in a smaller guess basis. Although 
reduced row echelon form of the Macaulay matrix can be used to compute the 
Groébner basis if the degree D is large enough [15], we use relatively small D 
(D < 3) to derive further relations as a preprocessing phase when there are 
some algebraic equations over finite field F2 among the given original equations. 
We include the Macaulay matrix preprocessing phase into Autoguess. This fea- 
ture allows the user to include the algebraic equations into the input text file for a 
hybrid relation file with connection relations as well as algebraic relations. Auto- 
guess applies the preprocessing phase with a specified degree on the algebraic 
equations and converts the derived equations into connection relations before 
encoding the GD attack. 
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5.2 Early-Abort Technique 


Besides the size of the guess basis, we can also detect other properties with a 
significant impact on the computational complexity of the resulting GD attack. 
For instance, additional conditions can help eliminate wrong key guesses early 
to reduce complexity. These corresponds to multiple independent paths to a 
variable in the determination flow. Beside the variable deduction from multi- 
ple independent paths, the unused equations between deduced variables before 
guessing the entire basis can be used for an early abortion of wrong guesses as 
well. If any of these two conditions hold after guessing the entire guess basis, 
we can still use them for checking the correctness of our guesses and reduce the 
data complexity, though they can not be used for early abortion and reducing 
the time complexity. To guide the user to simply detect the unused equations 
and the variables deduced from multiple paths, Autoguess returns all unused 
equations as well as those variables deducing from multiple paths, in addition to 
the determination flow. 


6 Application to Automatic Search for Key Bridges 


6.1 Application to PRESENT 


PRESENT is an [SO-standard ultra-lightweight SPN block cipher with 64-bit 
blocksize and 80-bit (or 128-bit) key K = ko...K79 (or K = ko...K127) as 
input. 

The key schedule of PRESENT includes bit-wise rotation, constant addition 
and an S-box. To model the S-box we assume that the output bits are deduced if 
all input bits are known, and vice versa. Let ko,,,...,k79,, represent whether the 
key bits Ko,...,4&79 are known in round r, where 0 < r < 31, and koo,..., k79,0 
the master key bits. The connection relations for R rounds of key schedule are 


kr+1,i + kr (i+61 mod 80); (1) 
kro, kr,1, kr,2, kr,3 => kr+1,i; kr+1,0, kr+1,1, kr+1,2; kr41,3 > kra; OS 4 <3, 


where 0 < r < R— 1. Similarly, to model R rounds of key schedule of PRESENT- 
128, it is sufficient to use the following relations alongside those in Eq. 1: 


kr 4, krs, kr 6, kr, > kr41,iikr41,4, B41 5s Er+1,6, Er41,7 > Krai for 4 <i < 7. (2) 


The best attacks on PRESENT so far are the linear attacks on 28 rounds of both 
variants of this cipher proposed at EUROCRYPT 2020 [11]. They try to use the 
dependencies between sub-key bits involved in the key recovery to reduce the 
time complexity of their general key recovery algorithms. However, the authors 
admit that they have been unable to provide an efficient general algorithm which 
takes account of all dependency relationships between the key-bits. In total, 96 
key bits need to be guessed in the 26-round attack on PRESENT-80 in [11]: 


T ={ko,16~47, k1,20~27, k1,36~43, k25,0, 25,2, k25,8, k25,10, k25,16, 25,18, k25,24, 25,26, 
k25,32, k25,34, k25,40, k25,42, k25,48, k25,50, k25,56, k25,58} U {k26,2.4: O< i < 31}. 
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Exploiting the dependencies between the involved key bits, the authors showed 
that all can be deduced from 61 bits. However, using Autoguess running on a 
single core of Intel Core i9 processor at 3.6 GHz, we can find the minimal guess 
basis Kr of size 60 in less than 3s with the Grébner basis approach, which 
includes the following variables: 


{ko6.24:0<i< 7}U {k6,42, k26,15~22, k26,24, k26,26~63, 26,67, k26,69, k26,75, k26,77}-. 


According to [11], the cost of computing the multiple linear cryptanalysis statistic 
throughout the analysis phase of the multiple linear attack on 26 rounds of 
PRESENT is Mo - 2!*7!, where Mz = 16. Consequently, our finding reduces the 
time complexity of the analysis phase from 2°° in [11] to 2°. 

Moreover, to compute the time complexity of the analysis phase in the 28- 
round multiple linear attack on PRESENT-128 [11], it is claimed that all relevant 
key bits can be deduced from 114 bits, whereas the minimum-size guess basis we 
could discover with our tool would include 115 bits. Contacting the authors of 
[11], they confirmed that it is a typo, and they also discovered a guess basis of 
size 115 in their analysis. Thus, the time complexity of the analysis phase in this 
attack is more than 217!:°8, whereas [11] claimed less than 2!7!. The total time 
complexity of attack remains at 21??, as the analysis phase is not the bottleneck. 


6.2 Application to LBlock with Nonlinear Key Schedule 


Application to Integral Attack on 24 Rounds of LBlock The best known 
single-key attack on LBlock [20], except for biclique attacks [19], is the 24- 
round integral attack in [6]. We follow the same notations as [6]. To mount 
a key-recovery attack on 24 rounds of LBlock, they use a 17-round integral 
distinguisher based on which the correctness of QZ!" [4] {3,2} = @ X}8/4]{3, 2} 
must be checked. Thanks to the meet-in-the-middle technique, they compute 
@ Z!"[4] and @ X}§ independently to reduce the time complexity further. To 
calculate the @ Z1"{4], 80 key bits are involved, but our tool automatically 
detects in less than a second that they can be determined from 55 key bits. 
On the other hand, 48 key bits are involved in calculation of @ X}8[4] whereas 
our tool automatically detects they can be determined from 47 key bits. Lastly, 
our tool detected that all involved key bits can be deduced from 69 variables: 
G= {koa,0~8; k24,17~30, kə4,34~79}, where kri is the ¿th sub-key bit in round r. 


Application to Impossible Differential Attack on 23 Rounds of LBlock. 
Applying our tool to find key bridges for the impossible differential attack on 
23-round LBlock, we can reproduce the same result as [16] in a few seconds with 
the Grdbner basis method. While [16] only detects the number of independent 
relations between the key bits, our tool not only finds a guess basis of 73 bits 
for 144 involved sub-key bits, but also produces a precise determination flow. 


7 Application to GD Attack on Block Ciphers 


We now show the usefulness of our tool to find GD attacks on block ciphers. 
The full version [12] includes further applications to CRAFT and SKINNY. 
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7.1 Automatic GD Attack on AES 


Let Wri j, Eri j, Yr,i,j, and z; denote whether the jth byte at the ith row before 
AK, SB, SR, and MC is known in round r of AES. Given that £r ij is known if 
and only if y,;,; is known for all 0 < i, j < 15, we can assume that £r ij = Yri,j- 
Assuming that M is the 4 x 4 MDS matrix of AES, if w = M x z, then 
knowing four bytes of (z,w) is sufficient to uniquely determine the remain- 
ing four bytes. Furthermore, 24,5 = Yrji,(j+i) mod 4 and Yr ij = Tri j for all 
0 < i,j < 15. Hence, each matrix multiplication (w,y.0,;, Wr,1,j, Wr,2,5) Wr,3,j)° = 
M X (2p,0,53 27,1459 Žr,2 j» 2r,3,5)° With 0 < j < 3 in the MC layer can be modeled 
via () = 56 symmetric relations, each of which including five variables from 
the following set: {Wr,0,j; Wr,1,j, Wr,2,j, Wr,3,55 Ur,0,75 Er,1,j+1; r,2,j4+25 Er 3 j+3} In 
total, 4 x (5) = 224 symmetric relations are required to model the MC layer. 

Let kr i j denote whether the jth byte in the ith row of the sub-key in round 
r is known, where 0 < i,j < 3, and 0 < r < 10. Since round constant c; is 
known, we can model the key schedule of AES via linear algebraic relations: For 
each key variable k,,;,3 in the third column of the key state, we define a new 
variable sk,,;,3 as well as the new symmetric relation [k,.;,3, Skr i,3], since kr,i,3 
can be uniquely determined from sk,.;,3 and vice versa. Thus, we can model the 
key schedule as 


krij D Kr+i,i,j-1 © kr+1,i,j = 0, 0<i<3,1<j<3, 
kr,3,0 ® skr,0,3 ® kr+1,3,0 = 0, kr,1,0 ® skr,2,3 ® kr+1,1,0 = 0, 
kr,2,0 ® skr,3,3 ® kr+1,2,0 = 0, kr,0,0 ® Skr,1,3 ® kr+1,0,0 = 0. 


In the AK layer, 16 bytes of sub-key are XORed to the internal state, which can 
be modeled via the connection relation [Wr—1 i,j; Kri js Zr i,j] for all 0 < i,j < 15. 

Consider an adversary who seeks to break 3 rounds of AES where only a 
single known plaintext is available. Given the 3-round connection relations and 
the known and target variables, (with or without preprocessing) Autoguess finds 
a minimal guess basis of 15 bytes. It means there is a GD attack with time com- 
plexity of 217° on 3 rounds of AES. As Autoguess yields the variables determined 
from multiple paths as well as relations not used during the determination flow, 
we can ensure that only one known plaintext/ciphertext pair is sufficient to 
uniquely determine the unknown variables. 

Running Autoguess on a single core Intel Core i9 processor at 3.6 GHz with 
the SAT-based method (CaDiCal), it took less than a minute, to find the GD 
attack on 3 rounds of AES, whereas it took about 10 h when we used the Gröbner 
basis approach (PolyBoRi). The MILP-based methods (Gurobi) is also much 
slower, even if we want to find only a feasible solution. For the 3-round GD 
attack on AES, Autoguess gives the same result as the dedicated AES tool in [2] 
when only one plaintext /ciphertext pair is known. 
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8 Application to GD Attack on Stream Ciphers 


8.1 Automatic GD Attack on ZUC 


ZUC is a word-based stream cipher with two versions, ZUC-128 [8] and ZUC-256 
[21], both performing exactly the same keystream generation phase (Fig. 2). 


<K 16 


Fig. 2. The keystream generation phase of ZUC stream cipher 


The LFSR consists of 16 cells S;,..., S415 at clock t, each a 31-bit element 
from the finite field GF(p), p = 23t — 1. The LFSR rule is 


Siege = 291544 +2" Siggi + 27 Siope +20 Sape + (1+ 2%)S; mod p. (3) 


If Sig44 = 0, then Sig. = p. If we denote the value of registers X0,...,X4 
at clock t as X01, X14, X24, X34, then we have X0: = SHisyel|S List, Xli = 
SLi144||SHo+4, X2: = SLri |S H54, and X3 = S Loil |S H, where SH, and 
SL; represent the high and low 16 bits of register S;, i.e., SH, = S;[30...15] 
and SL; = S;[15...0]. Note that, in this representation, SL,[15] = SH;[0]. We 
have: 


Z, = ((R1i; © X0;) Bzz R2) 6 X3:, (4) 
Wi, = Rl; H32 X 1h, W2, = R2, 0 X21, (5) 
Rli+ı = S (Ly (W1Li||W2H:)) 3 R241 =§ (L2 (W2L;||W1H;)) j (6) 


where Rl, and R2; represent the value of 32-bit registers R1 and R2 at clock 
t, and W1H;,, and W1L; represent the high and low 16 bits of W1 at clock t, 
respectively. W2L,4, and W2H, are defined in the same way. Here S is a 32 x 32 
one-to-one S-box, and L1, L are two 32 x 32 linear functions. 

The best previous GD attack on ZUC proposed in [10] has a computational 
complexity of 2392 while requiring 9 keystream outputs. Here, assisted by Auto- 
guess, we propose a GD attack on ZUC with a computational complexity of 2390, 
using the same amount of 9 keystream outputs as [10]. 

To find GD attacks on ZUC, we use a half-word-based model. Let R1A;, RIL, 
be the high and low 16 bits of R1 at clock t, and R2H;, R2L; of R2. Using (4): 


ZL = ((SLyase F R1L;) 16 R21) Ww SH, (7) 
ZH; = ((SHis+ p R1H;) 16 R2H;, 16 cl) } S Lort, (8) 
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where ZL, and ZH; denote the high and low 16 bits of output word Z at clock 
t, and cl, the carry bit in modular addition (SLZy44; @ R1L:) Big R2Le: 


(9) 


1 [1 if (SLi © RIL) + RAL, > 2'6 
cL = 
t \0 if (SLi © RIL) + RT, < 216. 


Splitting the Eq. (5) left and Eq. (5) right into two 16-bit halves, we have: 


WiL: = RIL; Bie SHo+t, W1H, = R1LA; Big SLiigt B16 c2:, (10) 
W2L = R2L, 6 SHA544, W2H, = R2H; © SL741., (11) 


where c2; is the carry bit of modular addition (SLi44; 6 R1L): 


1 if RIL; + SHopi > 27° 
a= Der ae (12) 


0 if RIL; + SHo44 < 2!6. 


All of the derived relations can be simply modeled via symmetric relations, 
except for Eqs. (9) and (12) which can be modeled via the implication relations. 
Besides the main equations, we add trivial implications to link each 32-bit word 
to its two half-words. For example, to link S; to SH; and SL, we include S; > 
SH,, Si => SL, and SH;,, SLi = S, into the model. 

Therefore, we can generate the system of connection relations modeling the 
knowledge propagation through to the given number of clock cycles of ZUC 
based on half-words. In contrast to our previous models, where all variables have 
the same size, the relations for ZUC use variables with different lengths. For 
the length of each variable, we use a weighted objective function in Line 20 of 
Algorithm 1, such that each variable is multiplied by its length. Consequently, solv- 
ing the generated model yields a guess basis of minimum weight, which corresponds 
to the minimum number of guessed bits to derive the internal state of ZUC. 

Applying Autoguess to the connection relations for 9 clock cycles of ZUC, we 
see that finding a minimal guess basis is very time consuming and state-of-the- 
art MILP or CP solvers cannot solve the problem in a reasonable time. However, 
we are still able to find feasible solutions which result in some guess basis smaller 
than that in [10] by two bits. One of the guess bases of size 391 bits we found is 


G= {55, Rls, Sig, SHis, Se, S7, S9, S10, S Lı3[14...0], S15, S16, Sis, S20, c25, S Hi2, cla}. 


Here, Z; and thus ZL, ZH, are known for 0 < t < 8. Thanks to the output of 
Autoguess, which provides the full determination flow, we see that two halves 
SLıı and SH41 of S11 can be deduced from two independent paths. Furthermore, 
Autoguess shows that SL, and SH, are independent of c13, S H12. To find this, 
we simply consider G\{c13, SH12} as the known variables and T = {SL11, SH11} 
as the target variable and run Autoguess to see whether there is a guess basis of 
size zero and thus $41, S11 can be uniquely deduced from G \ {cl3, SHi2}. As 
a consequence, we can check the condition SH11[0] = SL11[15], before guessing 
cls, SHı2 as an early abortion technique to filter out half of the wrong guesses. 
Algorithm 2 precisely describes our GD attack on ZUC more. Before Line 14 of 
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Algorithm 2, 374 bits are guessed in total. Since the condition in Line 14 holds 
with a probability of 271, the lines after Line 14 will be performed 2373 times 
on average, where there is a loop enumerating 21” possible cases for (S H12, cl3). 
Hence, the total computational complexity of our GD attack on ZUC is 239°, in 
which we use 9 output keystream to determine the whole internal state. 


9 Key-Recovery-Friendly Distinguishers 


Most automatic CP-based methods to search for distinguishers are blind to the 
key recovery phase and only describe the distinguishing part. Very recently, 
some authors attempted a unified CP-model combining the distinguisher and 
key recovery phases to directly find a key-recovery attack, but they are limited 
to linear key schedules [4,17,18]. To the best of our knowledge, no general CP- 
based model integrating both phases that finds key bridges for both linear and 
nonlinear key schedules has been introduced so far. We now show that our CP- 
based key-bridge search is consistent with previous CP-based methods to search 
for distinguishers, and hence they can be merged to directly find a key-recovery- 
friendly distinguisher. To do so, we extend the CP model of [18] with key recovery 
using key-bridging while searching for DS-MITM distinguishers for ciphers with 
linear and nonlinear key schedules. 


Algorithm 2: GD attack on ZUC with time complexity 23% 


Input: Output keystream derived from 9 clock cycles of ZUC: (Zo,..., Zs) 
forall (55, R15, S19, SH13) € F}*° do 


1 

2 W1L4,W2H4 = (6); Rila = (10); R2L5 = (7); cls = (9); C24 = (12); 

3 forall (S20, 97) € F$? do 

4 X05 <= SHo0||SLi9g; X35, X29 & SL7,S Hs; R25 & (4); W1H4,W2L4 < (6); 

5 forall (S16, S10, c25, S9, Sis, Sis, S6) € Fis do 

6 W1Hs (10); W2L5 =(11);R26 < (6); X14, X28 = SLi5||SHi3; S21 = (3); 
7 R14 = (5); R2L4 <— (11); R1L6 € (7); S22 = (3); X04 = SH i9||SLis; 

8 SH, <= (7); cl4 <= (9); R2H4 < (8); X34 & SL6||S Ha; W1L3,W2H3 < (6); 

9 cle = (9); X22, X37 <= SLo||SH7; c26 <= (12); W1L3,W2H3 < (6); 

10 W1L6 = (10); Sli <= (11); R24<= (4); X07 <= SHo2||SLa1; R2H3 <= (11); 
11 forall SL3[14...0] € F5° do 

12 W2He6 < (11); R17 < (6); S3 <= (3); R27 < (4); W2L6,W1H6 < (6); 
13 SHi & (11); Xlo = SLii||SHo; X12 = SLi3||SHi1; 

14 if SH, [0] = S Lı [15] then 

15 forall (SHi2,cl3) € F} do 

16 W1H3,W2L3 < (6); c23 < (12); R1H; <= (8); RiL3s <= (10); 
17 SLi4 = (10); X17 = SLis||SHig; W17 <= (5); W2H7 = (11); 
18 R1s < (6); W2L7<(11); R2s <= (6); SHg <=(7); R2L3 <= (11); 
19 Sli7<=(7); R1He (10); SLs <= (8); X21 & S Ls||S He; 

20 W1L2,W2Hə & (6); W1H2,W2L2 © (6); R1 L2 © (10); R12 © (5); 
21 W1L1,W2H, € (6); R22 & (5); W2L1,W1Hı € (6); 

22 Xis &SLı7||SH1ı5; Rls & (5); W1L5,W2Hs & (6); SLi2 & (11) 
23 X11, & SLı2||S H10; Rlr<(5); R2, = (5); W1Lo, W2Ho = (6); 
24 Rlo = (5); W2Lo, W1Ho = (6); R20 = (5); X00 =SHi5||SLia; 
25 X30 <= (4); SL = X30; R2L2 + (11); SH. <= (7); 

26 SHy4 = (10); $17(3); S1 = (3); S4 & (3); So = (3); 

27 if zuct§lks(9,,..., S15, Rlo, R20) = (Zo,.-., Z17) then 

28 | return (So, sey S15, Rio, R20) 

29 else 

30 | Go to step 1 and try another guesses. 
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Our strategy to integrate the key-bridging technique into the CP-based 
frameworks for automatic search of distinguishers is summarized as follows: 


1. We generate constraints for the distinguisher using previous CP models. 

2. Next, we generate the constraints modeling the active cell propagation before 
and after the distinguisher to determine the involved sub-keys. 

3. To model the key-schedule including the key-bridging technique, we generate 
the constraints for the key-bridging technique based on our approach. 

4. We add constraints to link the variables for the last step of knowledge prop- 
agation in key-bridging to the variables for the involved sub-keys. 

5. Finally, we look for a feasible solution minimizing the actual number of 
guessed sub-key variables such that all involved sub-keys can be deduced. 


Modeling the distinguisher part as well as the active cells propagation through 
the outer rounds (Item 1 and Item 2) are discussed in previous works [17,18]. 
We already described Item 3, so we now explain Item 4. 

Among the variables defined in Item 2, let 1S K,; be a binary variable indi- 
cating whether the ith word (bit) of sub-key in round r is involved. Assum- 
ing that wk sub-key words (bits) are used in each round, as illustrated in 
Fig.3, suppose that B = {ISK i: 0 < r < ra—1, 0<%< wk-1}, and 
F = {ISKri : ry +rm <7 <h +rm+rf—1, 0 <i < wk-— 1} include the 
indicator variables corresponding to the involved sub-key words in the rounds 
before and after the distinguisher. Moreover, assuming that 8 € Z>o denotes 
the depth of knowledge propagation in key-bridging, let SK, specify whether 
the ith word of the sub-key in round r is known at the jth step of knowledge 
propagation. While our previous models specified the target sub-keys in advance, 
in our new model we include constraints to dynamically determine the target 
sub-key variables: 


SKrig > ISKri: 0<r<rp—1 V rottrm <r <rptrmtry¢ —1, 0<i<wk-1}. 
14,8 , f 


Thus, if /SK,; = 1 then SK;,;,g = 1, so each sub-key variable is deduced 
after @ steps. Hence, we can minimize the guessed variables at the first step 
of knowledge propagation, 5>,.; SK;,;,0. To model the distinguisher part and 
the active cells propagation, we use the same method as [18], and follow their 
notations. Although we demonstrate the application of our method only for DS- 
MITM attacks, it can be straightforwardly applied to find key-recovery-friendly 
distinguishers in linear or differential cryptanalysis as well. 


9.1 DS-MITM Attack on SKINNY-{64-192, 64-128, 128-256} 


To show the usefulness of our method, we apply it to different SKINNY variants. 
For SKINNY-64-192, we discover a 21-round DS-MITM attack in the single- 
tweakey setting with a 8.5-round distinguisher. Although there is a 9.5-round 
DS-MITM distinguisher for SKINNY-64-192 which can be used to construct a 
21-round attack, thanks to our new model we noticed that building the attack 
on an 8.5-round distinguisher results in an attack with lower complexity. 


Autoguess 247 


Figure 4 illustrates the distinguisher of our attack on 21 rounds of SKINNY- 
64-192, where 31 nibbles should be guessed in the offline phase. Hence, the time 


complexity of the offline phase is 24*#! x 24%? x „SlCr © 218 °8Cy, and the 


memory complexity is (24%? — 1) x 4 x 24%31 m 2193.99 bits. As it is shown in 
Fig.5, 15 nibbles are active in the input state in the first round, which shows 
that the data complexity of our attack is 24x15 = 260 chosen plaintexts. Figure 5 
shows that 63 sub-key nibbles are involved in the key recovery attack, but they 
can be deduced from only 45 sub-key nibbles. As a result, the time complexity 
of the online phase is 2#°%* x 24%? x IUI Cp ~ 2186-3 Cp, 

In the same way, we find a 19-round DS-MITM attack on SKINNY-128-256 
and an 18-round attack on SKINNY-64-128, relying on 8.5-round and 7.5-round 
distinguishers. The complexity of our attacks is summarized in Table 1. The full 
version [12] includes more details about our DS-MITM attacks on SKINNY. 
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Fig. 4. Distinguisher for DS-MITM attack on 21-round SKINNY-64-192: A 8.5-round 
DS-MITM distinguisher for SKINNY-64-192 such that A = [0,13] (crosshatches in 
round 0), B = [12] (crosshatch at the end), and Deg(A, B) = 31 (in red). The blue cells 
can be determined from the red cells via the connection relations from the linear layer. 
(Color figure online) 
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Fig. 5. Key recovery and guessed values for DS-MITM attack on 21-round SKINNY-64- 
192: Backward differential and forward determination relationship in the outer rounds 
using the distinguisher in Fig. 4 as FE). Cells in Guess( Æo) and Guess( E2) are marked 
in red. Derive kz, and kp, from these. The orange sub-keys must be guessed (without 
key-bridging). Then, we can derive the orange nibbles of P° and thence all red nibbles 
(Color figure online). 


9.2 Improved DS-MITM Attack on TWINE-80 


The best DS-MITM attack on TWINE-80 is a 20-round attack built upon a 
11-round distinguisher [18]. Thanks to combining our key-bridging technique 
with the automatic method to search for DS-MITM distinguishers in [18], we 
discovered that using a 10-round distinguisher yields a better key recovery attack 
on 20-round TWINE-80 in terms of time complexity and memory complexity. 

Our DS-MITM distinguisher requires guessing 14 nibbles (compared to 19 in 
[18]), so the time complexity of the offline phase is 24% 14x 24x1 x -44 Cp ~ 256-48 
(276-930, in [18]), where Cg is the runtime of 20-round TWINE-80. The memory 
complexity is (24-1) x 4 x 2 x 24x14 œ 262-91 bits (282-9! in [18]). 26 sub-key 
nibbles are involved in the key recovery phase of our attack. With key-bridging, 
all 26 sub-key nibbles can be deduced from 19 sub-key nibbles. 7 + 12 = 19 
S-boxes are involved in the outer rounds, so the time complexity of the online 
phase is 24x19 x 24x1 x TH a 276-92, slightly lower than in [18]. The 76-bit 
subkey space is reduced by 4 bits. The data complexity is 24%8 = 232, since 8 
input nibbles are active. To see more details about our DS-MITM attack on 
TWINE-80 see the full version of our paper [12]. 
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10 Conclusion 


In this paper, we introduced the CP and SAT /SMT encoding of the GD prob- 
lem and integrated them with MILP- and Grébner basis-based methods in one 
tool to automate GD and key-bridging techniques. Moreover, we managed to 
integrate our CP-based approach for key-bridging technique into the previous 
CP-based frameworks for finding distinguishers and introduced a general CP 
model to search for key-recovery-friendly distinguishers supporting both linear 
and nonlinear key schedules for the first time. 

In our experiments, we observed that the SAT-based method often outper- 
forms the others when we want to find only a feasible solution. For instance, 
executing the SAT method on a single core Intel Core i9 processor at 3.6 GHz, it 
takes less than a second to reproduce the GD attack on Enocoro-128v2 in [3], 
whereas finding the same result via the MILP-based method takes minutes. We 
had the same observation in applying Autoguess to reproduce the best previous 
GD attacks on SNOW1, SNOW2, SNOW3, KCipher2, etc. However, in some other 
applications, using the Grébner basis approach performs better. For instance, 
we observed that the Gröbner basis-based method performs very well in our 
applications for the key-bridging technique, while it also ensures the optimum 
output. Thus, it makes sense to integrate different encoding methods in one tool. 


Acknowledgments. The authors would like to thank Mohammad Ali Orumiehchiha 
and Siwei Sun for motivating discussions and helpful comments on an earlier version 
of the tool. 
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Abstract. The recent KEMTLS protocol (Schwabe, Stebila and Wig- 
gers, CCS’20) is a promising design for a quantum-safe TLS handshake 
protocol. Focused on the web setting, wherein clients learn server public- 
key certificates only during connection establishment, a drawback of 
KEMTLS compared to TLS 1.3 is that it introduces an additional round 
trip before the server can send data, and an extra one for the client 
as well in the case of mutual authentication. In many scenarios, includ- 
ing IoT and embedded settings, client devices may however have the 
targeted server certificate pre-loaded, so that such performance penalty 
seems unnecessarily restrictive. 

This work proposes a variant of KEMTLS tailored to such scenarios. 
Our protocol leverages the fact that clients know the server public keys in 
advance to decrease handshake latency while protecting client identities. 
It combines medium-lived with long-term server public keys to enable a 
delayed form of forward secrecy even from the first data flow on, and full 
forward secrecy upon the first round trip. The new protocol is proved to 
achieve strong security guarantees, based on the security of the under- 
lying building blocks, in a new model for multi-stage key exchange with 
medium-lived keys. 


Keywords: Key Exchange - Post-Quantum - Identity Protection - 
KEMTLS 


1 Introduction 


The Transport Layer Security (TLS) protocol is among the most widely deployed 
cryptographic protocols. It is used to securely access web pages, email servers, 
Internet-of-Things (IoT) gateways or even servers in Cooperative Intelligent 
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Transport Systems [40] (C-ITSs). In the TLS handshake sub-protocol, a client 
and a server authenticate each other (at least the server to the client) and jointly 
establish a symmetric key that is then used in the record sub-protocol to privately 
communicate authenticated application data. The latest version of the protocol, 
standardized in 2018, is TLS 1.3 [35] and uses an ephemeral Diffie-Hellman key 
exchange to establish keys that remain secure even after a potential compromise 
of the parties’ long-term keys, i.e., enabling so-called forward secrecy. 


Post-quantum TLS. In anticipation of large-scale quantum computers, sev- 
eral candidates for a post-quantum version of the TLS handshake protocol have 
emerged. These for instance include the CECPQ2 experiment [29,30] by Google 
that combines X25519 ECDH with the NTRU-HRSS lattice-based key exchange 
in the TLS 1.3 handshake, or the Open Quantum Safe initiative [42] with proto- 
type integrations in the OpenSSL library of TLS 1.3 key exchange with hybrid 
security. 

A promising candidate in this area is the KEMTLS protocol recently proposed 
by Schwabe, Stebila, and Wiggers [38]. It is free of handshake signatures and only 
relies on key encapsulation to provide both key establishment and authentication 
in a quantum-safe way. The main idea is reminiscent of the OPTLS protocol [28] 
(which in turn inspired the TLS 1.3 handshake design): at its core are encapsu- 
lations against the respective partner’s public key, using the resulting secrets to 
establish a shared key. As the resulting shared key can only be recovered with the 
partner’s secret key, this approach implicitly authenticates the partner. Besides, 
to enable forward secrecy, the client also sends at the beginning of the protocol an 
ephemeral public key that the server encapsulates against to obtain an ephemeral 
contribution. The prototype implementation of KEMTLS showed that its band- 
width was over 50% lighter than that of a size-optimized post-quantum instanti- 
ation of TLS 1.3, and that it reduces the amount of CPU cycles by almost 90% 
compared to a speed-optimized post-quantum instantiation of TLS 1.3. 

However, the KEMTLS protocol only treats the classical web scenario in 
which the client has no prior knowledge of the server public key, although the 
client could in practice cache the server certificate during an initial handshake. In 
IoT or embedded-device settings, the server public key is often even hardcoded, 
e.g., in firmware. The client therefore knows the server public key ahead of time 
in many practical scenarios. This knowledge not only has the benefit of allowing 
the client to verify the server certificate only once before any handshake (thereby 
speeding handshakes and saving power for IoT devices), but could potentially 
lead to a protocol with fewer message round trips, which is in practice crucial 
to reduce network latency (also in the web setting) and power consumption. 

Indeed, in the KEMTLS protocol, the server cannot send application data 
before the client does, and the client can only send data after two round trips in 
the case of mutual authentication, i.e., it is a two-Round-Trip-Time or 2-RTT 
protocol. This contrasts with TLS 1.3, where the server can send data (e.g., a 
server banner or an IoT-hub certificate) from its first message flow and the client 
can do so after a single round trip even in the case of mutual authentication. 
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The underlying reason is that each party must wait for the other’s public key 
to then encapsulate against it, thereby implicitly authenticating the latter, since 
there are no handshake signatures as in TLS 1.3. Scenarios in which the client 
knows the server public key from the beginning of the protocol hence promise 
to enable substantial performance improvements. 


No Forward Identity Protection in 1-RTT. In case the client must also 
authenticate herself to the server, as it is for instance necessary for IoT devices 
or vehicles in C-ITSs, the TLS protocol is expected to also provide identity pro- 
tection [26], namely that the client’s identity should only be recoverable by a 
server that is already authenticated. The client can of course leverage the server 
public key that it already knows to encrypt her certificate, but since there is no 
ephemeral contribution from the server yet, an adversary that compromises the 
server’s key could recover the client’s identity even after the handshake com- 
pleted. In other words, there would be no forward(-secure) identity protection. 

Despite the efficiency benefits of a 1-RTT protocol, forgoing forward identity 
protection altogether might be too great of a compromise, especially when pri- 
vacy is a primary concern. For instance, the European Telecommunications Stan- 
dards Institute identifies the high risk of user profiling as a main privacy challenge 
in IoT [16]. The US National Institute of Standards and Technology considers as 
a high-level risk mitigation “safeguarding the confidentiality ... of data... col- 
lected by, stored on, processed by, or transmitted to or from the IoT device” [17] 
and stated that an IoT device should have “the ability to use demonstrably secure 
cryptographic modules for standardized cryptographic algorithms ... to prevent 
the confidentiality ... of the device’s stored and transmitted data from being com- 
promised” [18]; here, the client identity belongs to such transmitted data. 

Nevertheless, to maintain client privacy (in a protocol using only key encap- 
sulation) even if the server long-term keys are later compromised, the client 
cannot send her certificate before the server has made an ephemeral contribu- 
tion in a first round trip. This means the client cannot be authenticated before 
the server encapsulates against her public key in a second round trip. There 
seems to be no way of fully leveraging the knowledge of the server public key to 
have a 1-RTT protocol while maintaining forward identity protection. 


1.1 Contributions 


The core contribution of this paper is a protocol (in Sect.3) that bridges the 
gap between forward identity protection and a 1-RTT protocol solely based on 
key encapsulation, under the assumption that the client knows the server public 
key at the start of the protocol (see Fig. 1 for a sketch). The main idea is to 
introduce semi-static public keys on the server side which the client also knows 
at the start of the protocol. These semi-static keys are periodically refreshed (e.g., 
once every other day), and if the corresponding secret key is not compromised 
before it expires, the client’s identity can no longer be recovered, even if the server 
long-term secret key is later compromised. In this sense, the protocol satisfies a 
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delayed form of forward identity protection [8] without any extra round compared 
to a 1-RTT protocol without forward identity protection. As a side-effect, the 
protocol also allows for optional zero round-trip time (0-RTT) data with the 
same delayed forward secrecy, which the client can already send within its first 
flight without having to wait for the server. Since the semi-static keys are not 
assumed to be certified (the protocol would otherwise be impractical), they must 
be transmitted during an initial handshake that then consists of two round trips. 
The protocol takes care of this mechanism, and allows for semi-static keys to 
roll over between two time periods, so that servers can serve clients using both 
the key for the current and the next time periods. 

Section 4 presents a model that formalizes the properties expected from a pro- 
tocol involving semi-static keys, and Sect. 5 proves (in the reductionist framework 
and in exact-security terms) that the protocol does satisfy them under standard 
assumptions. The model in Sect. 4 is closely related to the multi-stage key exchange 
model [19] proposed for TLS 1.3 [13,14] and that for KEMTLS [38], but it also 
accounts for the semi-static keys and their lifetime. Section 5 then shows that the 
protocol achieves the intended security levels across the various stages of the hand- 
shake, relying only on standard-model assumptions. Section 6 compares the pro- 
tocol to alternative approaches and highlights its advantages. Section 7 discusses 
implementation choices as well as a prototype implementation and Sect. 8 dis- 
cusses benchmarking results. As expected, caching certificates incurs significant 
performance gains as it reduces the handshake time by at least 45%, and the pri- 
vacy gains from semi-static keys come at negligible performance costs. 


Concurrent Work. In concurrent work, Schwabe, Stebila, and Wiggers [39] also 
consider a variant of the KEMTLS protocol, called KEMTLS-PDK, that lever- 
ages prior knowledge of peer public keys. Similarly to this work, they show how 
pre-distributed public keys can lead to reduced round trips and bandwidth for 
the handshake. Their work further explores the performance characteristics for 
various NIST post-quantum KEM candidates. This work in contrast focuses on 
identity privacy and forward secrecy: beyond leveraging pre-distributed long- 
term keys, our protocol additionally employs in-band-distributed, semi-static 
keys to achieve (delayed) forward secrecy for the first data flow including the 
client’s identity and, optionally, 0-RTT data. 


2 Preliminaries 


Notation. The security parameter is denoted A and is encoded in unary when 
given as input to algorithms. For an integer n > 1, [n] denotes the set {1,...,n}. 
The notation y — A(x) or A(x) — y means that a deterministic algorithm A 
runs on input x and returns y; for probabilistic algorithms the notation <g resp. 
—g is used instead. 


Symmetric Primitives. The presented protocols rely on classical symmetric 
primitives, including collision-resistant hash functions, pseudorandom functions, 
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Client Server 


(pke, ske) <—g KEMe.KeyGen (>) 

(Ks, Cs) +s KEMs.Encaps(pk,); (Kt, C2) <3 KEMs.Encaps (pk) 
Ko < KDF (K$) 

Kı, Ki, K2 + KDF (Ks, K$) 


pke, AEAD xo (Cs), Cf, AEADx, (cert[pk,]), AEADx: (opt. 0-RTT data) 


Ki + KEM, .Decaps (skf, C$) 
Ko < KDF( KŻ) 
Ks < KEM,.Decaps (sks, Cs) 
Kı, Ki, K2 — KDF (Ks, KS) 
(Ke, Ce) +s KEMe.Encaps (pk,) ; (Ke, Ce) <3 KEM-.Encaps (pk) 
K3,c, K3,c, K3,s, K3, — KDF (Ks, K}, Ke, Ke) 
Ce, AEADK, (Cc), AEAD x; , (key confirmation), AEAD xy , (app. data) 
Ke 4+ KEM¢.Decaps (ske, Ce) ; Ke <- KEMc.Decaps (ske, Ce) 
K3,c, K3,c, Ks,s, K3, — KDF (Ks, KS, Ke, Ke) 
AEAD ç, „(key confirmation), AEAD x: (app. data) 


Fig. 1. Sketch of the main protocol. 


message-authentication codes, and key derivation functions, formally defined in 
the full version [23]. 


Key Encapsulation Mechanisms. The protocol further relies on key encapsulation 
mechanisms (KEMs), a public-key primitive which allows a party to send a 
symmetric key to another party encrypted under the public key of the latter. It 
consists of a key generation algorithm KeyGen (1>) —s (pk, sk) that generates a 
pair of public and secret keys, an encapsulation algorithm Encaps(pk) >ş (K, C) 
which computes a symmetric key in a set K and a ciphertext, and a decapsulation 
algorithm Decaps(sk, C) — K that computes a symmetric key on the input of a 
secret key and a ciphertext. 


3 Protocol 


This section presents a key-exchange protocol, specified in Fig. 2, with mutual 
authentication that solely relies on KEMs for key establishment and authenti- 
cation between a client and a server. The protocol assumes the client to have 
prior knowledge of the server certificate, as it is often the case for embedded or 
IoT devices and other applications of TLS. This, together with novel insights, 
allows the client to send forward-secret and fully authenticated application data 
after a single round trip, and the server from its first flow, as in TLS 1.3. It also 
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allows (optional) zero round-trip time (0-RTT) to be send by the client along 
with its first flight of messages. In comparison, in the KEMTLS protocol [38] the 
client can only send application data after two round trips in the case of mutual 
authentication, and the server can only do so from its second flow regardless of 
client authentication. 


Building Blocks. The protocol involves three KEMs: KEM, for establishing 
ephemeral secrets and enabling forward secrecy, KEM, for implicit client authen- 
tication, and KEM, for implicit server authentication. All three could be instan- 
tiated with the same scheme or be chosen differently depending on various opti- 
mization factors. For instance, KEM. could be chosen so as to minimize the 
key-generation time and alleviate client computation, whereas KEM, and KEM, 
could be selected as schemes with fast encapsulation even though key generation 
might be long, with an even stronger computational-efficiency requirement for 
the client than for the server. 

Besides, the protocol also uses Krawczyk’s hash-based key derivation function 
HKDF [27] as keystone of the key schedule to extract (via HKDF.Extract) ran- 
domness from the KEM-generated secrets and derive (via HKDF.Expand) stage 
keys, HMAC [4] as message authentication code for explicit party authentication 
and a hash function H, e.g., SHA-256, to compute expansion labels for HKDF 
as well as compress the handshake messages before explicit authentication. 


Outline. The protocol shares similarities with the KEMTLS protocol, which is 
itself modeled after the OPTLS protocol [28]. However, it goes beyond prior work 
to reconcile client privacy (even if server long-term keys are later compromised) 
and a 1-RTT handshake: it leverages server semi-static KEM keys which the 
client encapsulates against and mixes the result into the key schedule at the 
beginning of the protocol, so that only a party privy to the semi-static secret 
key can decipher the client identity. 


Key Lifetime. A pair of semi-static keys is only to be used in a given time period, 
e.g., a duration of two days, after which the server refreshes the pair. Though 
the privacy guarantees are not as strong as those of a 2-RTT handshake which 
uses fully ephemeral secrets to protect client certificates, they are still relevant 
in practice and it is a fair compromise for the efficiency benefits. 


Clocks. The server keeps track of time periods with an integer counter. Only 
the server must maintain a clock, just to know when to refresh the keys. The 
client need only store the latest semi-static public key it received from the server 
along with the corresponding time period, which is indicated by the server. This 
means that the protocol can even be used with clients that may not have a clock 
as it is the case for some IoT devices. 


Time-Period Transition. The server generates the keys for a time period before 
its beginning and sends the public key to the client as part of a handshake during 
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Client Server 
cert[pk.], ske, cert[pk,], ts,c, pki? sks, ts, skt® 


(pke, Ske) <3 KEMe.KeyGen (1>) 
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(Ks, Cs) +s KEMs.Encaps(pk,) 
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Ke + KEMe.Decaps(ske, Ce) 
Ke 4+ KEM-.Decaps(sk-, Ce) 


IMS + HKDF.Extract (dHS, Ke) 
dIMS + HKDF.Expand (IMS, "derived" ) 
MS + HKDF.Extract (dIMS, Ke) 
accept SAHTS + HKDF.Expand (M i 


stage 5 
stage 6 


fk, +} HKDF.Expand (MS, "s finished") 


{SPK := ServerPublicKey} :t+1, pki? 
{EE := EncryptedExtensions } 


* 
stages 
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{ServerFinished} : SF ~ HMAC (fk,, H (CH, ..., EE)) 


stages 
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abort if SF # HMAC (fk,, H (CH, . . . , EE)) 
accept SATS + HKDF.Expand (MS, "s app traffic"||H(CH,...,SF)) 


(tae + 1, pks t?) + (¢+1, pkit?) 


fk, <_ HKDF.Expand (MS, "c finished") 


{ClientFinished} : CF < HMAC (fk, H(CH, ..., SF)) 


stages 


abort if CF # HMAC (fk,, H (CH, . . . , SF)) 
accept CATS + HKDF.Expand(MS,"c app traffic"||/H(CH,...,CF)) 


Fig. 2. Protocol in the case of matching time periods. 
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a transition phase from the previous time period, e.g., the last hour. During this 
transition phase, the server not only accepts handshake requests with the current 
key, but also with the next one, so that the client can use the next key as soon 
as it receives it. 

In case the client does not connect to the server during this transition phase, 
the client simply initiates the protocol with the latest known key (if any) in 
addition to the server long-term key. The server then just rejects the ciphertext 
encrypting the client certificate and returns the current public key; following in 
spirit the HelloRetryRequest message sent in a TLS 1.3 handshake upon config- 
uration mismatch [35, Sect. 4.1.4]. The client can now send its certificate anew, 
encrypted under a mixture of ephemeral KEM secret (instead of the skipped 
semi-static secret) and long-term KEM secret. The two parties otherwise follow 
essentially the same flow as in Fig.2, leading to only a one-time delay by one 
round trip to re-synchronize. For space reasons, we give the protocol version for 
unmatching time periods in the full version [23]. 


Protocol Notation. In Figs. 2, “MSG : M” denotes that message MSG is sent and 
contains M, and “{MSG}.,.... : M” denotes the AEAD encryption of a mes- 
sage MSG containing M under an AEAD key derived from the secret accepted at 
stage k (the derivation is not made explicit on the figures). A star (*) as super- 
script indicates that the message is only sent the during the transition from the 
current server time period to the next. 


Inputs. At the beginning of the protocol, in addition to its certificate cert[pk,] 
and secret key ske, the client holds a server long-term certificate cert[pk,] and 
the latest server semi-static key pktse known to the client in a time period tse- 
By convention, pkts := L and tse = —oo if the client has never obtained a 
semi-static key from the intended partner server. As for the server, it is given 
as input a long-term secret key sk, and a semi-static secret sk‘* corresponding 
to the current server time period t.. Note that the long-term public keys are 
certificated out of band by an external certification authority. In constrast, the 
semi-static public keys are not assumed to be certified. 


Protocol Steps. The main protocol steps are as follows. 


— The client first generates a pair (pke, Ske) of ephemeral keys, and sends the 
public key with a fresh nonce ne and the list of algorithms it supports in a 
ClientHello message. 

— The client then encapsulates against the server semi-static key and sends 
the resulting ciphertext Cee together with its time period ts within a 
SemiStaticKEMCiphertext message. It uses the resulting semi-static secret 
K3"* to compute an early secret ES, and from that derive a first stage key, 
the early handshake traffic secret EHTS. 

— The client next encapsulates against the server long-term key, sending the 
resulting ciphertext C, AEAD-encrypted under EHTS. (This protects the 
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certified identity of the server from an active adversary with delayed for- 
ward secrecy, in case Cs may leak such information.) The resulting key Ks is 
mixed into the key schedule to obtain a handshake secret HS, which implicitly 
authenticates the server. 

— The handshake secret HS is used to compute server and client handshake 
traffic secrets SHTS and CHTS. The client uses CHTS to AEAD-encrypt its 
certificate in a ClientCertificate message. (This ensures that only a party 
knowing both the server long-term and semi-static secret keys used can infer 
information about the client’s identity.) 

Additionally, an early traffic secret ETS is derived to optionally send pro- 
tected 0-RTT application data. 

— When the server receives the ClientHello, SemiStaticKEMCiphertext and 
ClientCertificate messages from the client, two cases arise: either the client 
time period t,. matches the current server time period ts or not. 
Matching time periods. If ts = ts =: t (or tse = ts +1 during the tran- 
sition from ts to ts + 1) as in Fig. 2, the server has the semi-static secret key 
skt and can thus compute EHTS, SHTS, CHTS, and ETS, and recover Cs, 
the client certificate and potential early application data. 

x The server encapsulates against pk, and sends in a ServerHello mes- 
sage the resulting ciphertext Ce, together with a fresh nonce and the 
algorithms selected from the algorithms that the client supports. 

* Next, the server encapsulates against the client public key pk, and 
encrypts the resulting ciphertext C. under SHTS. (This prevents infor- 
mation about the client’s identity to leak through Ce.) 

x Both parties now compute a master secret by mixing in the ephemeral 
and client long-term KEM secrets Ke and Ke. Secret Ke enables forward 
secrecy, Ke implicitly authenticates the client. 

x From MS, both parties compute (mutually) authenticated handshake traf- 
fic secrets SAHTS and CAHTS for the server and the client, used to derive 
AEAD keys to encrypt the remaining handshake. 

* From MS, MAC “finished” keys fk, and fk, are further derived for explicit 
authentication as well as application transport secrets SATS and CATS 
for application data encryption. 

x The server explicitly authenticates by sending a “finished” message, a 
MAC over the transcript under key fk, and can then send application 
data. 

During a transition phase to the next time period, it also sends the public 
key pkttt for the next time period.' The client saves this key (and discards 
pk‘) only after verifying the server MAC. 

x Upon receiving the server “finished” message, the client explicitly authen- 
ticates by also sending a MAC over the transcript under key fk,, and can 
then send application data. 


1 The server does so once per client; the client will then switch to the next key for 
subsequent handshakes. 
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Unmatching Time Periods. If ts < £ ts (and ts c  ts+1 during the transi- 
tion from t, to ts +1), the server does not hold sktee and cannot compute the 
early handshake-traffic, the server/client handshake traffic or the early traffic 
secrets (denoted EHTS’, SHTS’, CHTS’, and ETS’), and therefore cannot 
recover Ks, the client certificate or any potential early application data. The 
server thus rejects the first four stages. 

The main idea in this case is close to that of a HelloRetryRequest in 
TLS 1.3 [35]. The server’s response to the client does not contain a KEM, 
ciphertext, indicating that their time periods did not match, but however 
contains an ephemeral KEM ciphertext. The client can then decapsulate the 
ciphertext, recover an ephemeral secret, and restart as in the case of matching 
time periods; the now-established ephemeral secret essentially takes the place 
of the semi-static one. The protocol is thereby delayed by a single round trip. 
The details are given in the full version [23]. 


4 Security Model 


This section introduces the model to capture security of the key-exchange 
protocol presented in Sect.3. It is close to the model for authenticated key 
exchange proposed by Dowling, Fischlin, Günther and Stebila [13,14] and that 
for KEMTLS by Schwabe, Stebila, and Wiggers [38]. Their models follow a line 
of work [19,22] concerned with multi-stage key exchange protocols in which keys 
are computed at multiple stages of each single protocol execution. Session-key 
indistinguishability originates from the seminal Bellare-Rogaway model [5]. Due 
to space restrictions, only the key properties captured by the model are presented 
here; the technical details are given in the full version [23]. 

In the security model, the adversary controls the network and can passively 
eavesdrop, modify and orchestrate the communication across several concurrent 
sessions of the protocol. The adversary can further expose long-term and semi- 
static secrets of honest parties as well as the keys established during protocol 
runs (individually per stage). The protocol is then deemed multi-stage secure 
if such an adversary cannot distinguish a key established at a stage of a non- 
compromised (“fresh”) session from a uniformly random key. 


Authentication. The model supports mutual authentication, as required in the 
scenario of IoT or embedded devices. For the authentication of each stage key, 
implicit and explicit authentication are distinguished. Implicit authentication 
refers to the property that the stage key can only be recovered by the intended 
partner, whereas explicit authentication guarantees that the partner actively 
participated in the protocol and also established a stage key. The authentication 
of a stage key can further be lifted from unauthenticated or implicit to explicit 
once a later stage of the protocol is accepted: a stage key can be retroactively 
explicitly authenticated. 
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Forward Secrecy. The model further covers forward secrecy, the notion that 
stage keys remain secret even if the long-term keys involved in its computation 
are later compromised. As the protocol in Sect. 3 introduces server semi-static 
keys (i.e., keys that are periodically refreshed) for servers in addition to long-term 
keys, the notion of forward secrecy is here refined to also take the compromise 
of such keys into account. 

More precisely, the model considers two types of forward secrecy determined 
by whether the semi-static key used to compute a stage key may be corrupted. 
A stage key satisfies (full) forward secrecy if the adversary remained passive 
until the stage is accepted or did not corrupt the long-term key of the intended 
communication partner before the latter was explicitly authenticated. The semi- 
static key used to compute the stage key may be corrupted at any time. A stage 
key satisfies delayed forward secrecy if, in addition to the previous conditions, 
the adversary did not corrupt the semi-static key used to compute the stage 
key. In particular, if the long-term key of the intended partner is not corrupted 
before the semi-static key expires, the secrecy of the stage key is equivalent to 
that of a key satisfying full forward secrecy. This (informal) definition of delayed 
forward secrecy is related to Boyd and Gellert’s [8]. 


Key Usage. The use of stage keys is also specified, i.e., whether a key is meant 
to be used internally within the protocol (e.g., to encrypt handshake traffic) or 
externally (for example to protect application messages). 


Replays. The model further captures that the initial, first-flight keys are 
replayable: an attacker may copy the client’s initial messages and send them 
to the server (again), leading to multiple server sessions sharing the same keys 
with that one original client. This is due to the key being derived without inter- 
action (in zero round-trip time) and hence with no active contribution from the 
server side. Following [14,20], the model distinguishes between replayable and 
non-replayable stages, catering for this situation (which would otherwise lead to 
a violation of partnering uniqueness), while still demanding that keys remain 
indistinguishable from random, even when replayable. 


5 Security Analysis 


This section discusses the security of the Sect.3 protocol in the model from 
Sect. 4, and shows that it achieves multi-stage security based on the IND-CCA 
security of the involved KEMs, PRF security of the key derivation functions, 
EUF-CMA security of HMAC, and collision resistance of the hash function. 
Only a summary of the results in the case of matching time periods is given here 
due to space constraints. 


Properties. The protocol satisfies the following properties in the case of matching 
time periods (Fig.2). It has 8 stages, of which the first four satisfy delayed 
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forward secrecy, the others full forward secrecy. Server and client are implicitly 
authentication from stage 2 resp. stage 5 on and explicitly authenticated from 
stage 7 resp. stage 8 on. The keys of stages 1-3 and 5-6 are used internally, 
to encrypt handshake traffic. The first four stage keys, without active server 
contribution, are replayable; all other keys are not. 


Theorem 51 (Multi-stage Security — Matching Time Periods). Let # 
be an adversary against the multi-stage security of the protocol in Fig. 2. There 
exist explicit reduction algorithms to the respective security of each protocol build- 
ing block such that the advantage of & in the multi-stage security game in the 
case of matching time periods is at most 
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with Nid, Nperiod, ANd No being the number of users, time periods used across 
all servers, resp. sessions, €S?! being the probability that an algorithm given in 
the proof finds a collision for H by running & as subroutine, and KEM,, KEM., 
and KEM, being ôs-, de-, and d¢-correct. 


Proof (Sketch, cf. full version [23] ). The proof proceeds via a sequence of games, 
initially ruling out nonce collisions (via the birthday bound 27?57n2) and hash 
collisions in honest sessions (based on the hash-function collision resistance). It 
then applies a hybrid argument, reducing the number of tests to a single one, 
losing a factor of at most the total number of stages across all sessions, i.e., 8no. 

The proof then branches into several sub-cases based on the freshness condi- 
tions necessary to test a session (cf. the formal definition in the full version [23]). 
The main steps in each branch essentially consist in proving the following. 


1. One of the KEM-encapsulated secrets K,, KO, Ke or Ke used in the tested 
session remains secret due to encapsulation being done by an honest ses- 
sion and the corresponding KEM secret key remaining uncompromised. This 
allows to replace that KEM key K with a uniformly random value based via 
a reduction to the IND-CCA security of the corresponding KEM. Based on 
the KEM type, this step first requires guessing the corresponding peer, time 
period, and/or partner session holding the KEM public key, inducing factors 
involving Nid, Nperiod, ESP. No. 

2. This allows to replace all keys derived from the secret KEM key K all the way 
up to the master secret MS and the keys derived from MS. These steps can be 
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argued via the PRF (or dual-PRF) security of the involved HKDF.Extract and 
HKDF.Expand calls, replacing derived keys at on level at a time with random 
values. After these steps, tested keys are indistinguishable from random ones, 
preventing the adversary from winning by testing a fresh session key. 

3. As a result, any received HMAC value ensuring explicit authentication cannot 
have been forged, as this would result in a valid MAC forgery under the (now 
random) MAC keys fk,/fk,, contradicting the EUF-CMA security of HMAC. 
This guarantees that the adversary cannot make a session maliciously accept. 


An upper-bound over all sub-cases then yields the theorem statement. 


The multi-stage security bound for the protocol with unmatching time peri- 
ods is derived similarly to that in Theorem 51, except that there are twelve 
stages to consider (implying a hybrid factor loss of 12n, instead of 8n,) and 
four more keys (EHTS’ on the one hand, and SHTS’, CHTS’ and ETS’ on the 
other hand) derived (adding an extra 2 ERDF. Expand term in the bound). Due 
to space constraints, the full proof and the detailed theorem statement for the 
unmatching case are deferred to the full version [23]. 


6 Discussion 


Identity Protection. TLS 1.3 protects parties’ identities by following the 
SIGMA-I key exchange pattern of Krawczyk [26]. More specifically, it protects 
the server identity against passive attackers and the client identity against active 
attackers, the latter identity being revealed only after having seen a valid server 
signature. The KEMTLS protocol [38] carefully mimics these properties, achiev- 
ing identity protection for the server against passive attackers and for the client 
against active attackers. Client identity protection in KEMTLS comes with an 
additional half or full round trip (depending on the targeted authentication 
properties). The KEMTLS-PDK protocol [39], in reducing roundtrips, sends the 
KEM encapsulation against the server’s static key in cleartext. Unless an anony- 
mous KEM [3,21,32] is deployed, this value might leak information about the 
server’s identity. 

Our protocol leverages the pre-loaded server certificate to reduce handshake 
round trips while achieving stronger identity protection: it protects both server 
and client identities against active attackers, both with delayed forward secrecy 
through encrypting the client certificate and ClientKEMCiphertext C, under 
the server’s semi-static key (authenticated in a previous handshake). 


On the Security Proofs. The security proofs are similar to those of the KEMTLS 
protocol, are given in the standard model and do not rely on any form of 
adversary rewinding. Existing techniques in the literature (e.g., Song’s “lifting 
lemma” [41]) can thus be used to prove the protocol secure against quantum 
adversaries as long as the underlying primitives are. 

However, the proofs are non-tight (with the precise losses spelled out in exact- 
security terms) as they require to guess the test session as well as, depending on 
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the proof case, the contributive session or the identity of the intended peer. The 
proofs can thus be understood as heuristic arguments for the soundness of the 
protocol design. It is worth noting that except for very recent work on TLS 1.3 
[11,12], most proofs of deployed authenticated key-exchange protocols are also 
non-tight. 


Downgrade Resilience. The model in Sect. 4 does not capture algorithm negoti- 
ation although any practical deployment of the protocol would support multiple 
instantiations for each primitive. However, one can still informally argue that 
the downgrade resilience properties of the protocol in Sect. 3 are similar to those 
of the KEMTLS protocol. More precisely, an active adversary could in principle 
make a party choose an algorithm other than the one it would have used if the 
adversary were passive, but the adversary cannot make a party use an unsup- 
ported algorithm. Moreover, assuming that the security of the building blocks is 
not breached before the confirmation messages are received, the client and the 
server are guaranteed to share the same transcript which includes negotiation 
messages. In other words, full downgrade resilience [6,15] is satisfied once the 
other party is explicitly authenticated. 


Comparison with KEMTLS. The assumption that the client knows the server 
public key from the onset of the protocol is precisely what allows to have the 
server send application data from its first message flow and to reduce the hand- 
shake by a full round-trip compared to the KEMTLS protocol. It also implies 
that the client need not verify the server certificate during the handshake, which 
speeds up the handshake even further and reduces power consumption. 

However, as explained in the introduction, in a KEM-based protocol that 
achieves mutual authentication in a single round trip, an adversary could a 
priori recover the client’s identity by corrupting the long-term key of the server 
even after the handshake is completed (no forward identity protection), as it is 
for instance the case of the KEMTLS-PDK protocol [39]. The semi-static keys 
introduced in this paper mitigate this privacy loss and ensure, without extra 
round trip, that the client’s identity cannot be recovered once the semi-static 
keys have expired. The lifetime of the semi-static keys now depends on the desired 
trade-off between efficiency and privacy: the shorter the lifetime is, the stronger 
the privacy guarantees are for the client and the heavier the computational 
burden is on (mainly) the server. 


Comparison with Session Resumption and Forward-Secret 0-RTT. TLS 1.3 spec- 
ifies a session resumption (pre-shared key/“PSK”) handshake, bootstrapping 
from symmetric secret keys that have been established in a prior connection 
and also enabling a 0-RTT mode. As also discussed in [39], the PSK handshake 
has efficiency advantages (e.g., for relying purely on symmetric cryptography) 
but also downsides wrt. key management of symmetric keys which need to be 
frequently changed (requiring additional communication) and securely stored in 
client memory. Our approach in contrast only stores (semi-static and long-term) 
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public keys of the server at the client, reducing the risk for compromise as well 
as communication overhead. 

The 0-RTT mode of TLS 1.3 enables clients to send application data in the 
first message flow, and thus reduce the handshake by a round trip compared to 
the standard mode. This requires servers to reconstruct secrets from previous 
sessions when receiving the clients’ first messages, i.e., the 0-RTT mode is a 
resumption mechanism. 

The standard resumption technique to achieve forward-secrecy and resilience 
to replay attacks consists in having servers store session caches (resumption 
secrets from all recent sessions) in local databases and issuing clients unique 
lookup keys that they use for their next connections. Similar techniques could 
a priori be used to reduce the KEMTLS handshake while maintaining forward 
identity protection, provided that the resumption handshake uses a KEM to 
achieve post-quantum security. The presented approach with semi-static keys in 
contrast obviates the need for extra secure updatable storage on the client side 
for resumption keys. It also allows the server to save storage by re-using skis 
with many clients; session caches can easily grow huge. 

Aviram, Gellert, and Jager [1,2] proposed a different approach to forward 
secrecy based on puncturing techniques, improving over session caches in terms 
of server storage. Yet, at a 128-bit security level this easily requires tens of MB 
of server storage, compared to, e.g., a 2.342 kB single Kyber key pair with our 
protocol. 

The main benefits of our protocol over forward-secret session resumption 
are therefore in small storage overhead (mainly on the server side), not needing 
(expensive) updatable secure client storage, and reliance on standardized post- 
quantum KEM components. 


7 Implementation 


This section discusses the implementation choices for the handshake protocol 
in Sect. 3. Since certificates are pre-distributed and need not be verified during 
handshakes, the main performance bottleneck depends on the choice of underly- 
ing KEMs. The main protocol is subsequently denoted PDK-SS (pre-distributed 
keys with semi-static contributions) for simpler referencing. 


Choice of Primitives. The KEMs considered for implementation are among the 
finalists and alternates in the third round of the NIST Post-Quantum Standard- 
ization Process [33], with parameters chosen at security level 1 (roughly equiv- 
alent to the security of AES-128). The criteria of particular relevance in the 
IoT use case include the speed of cryptographic operations, the size of cipher- 
texts that may impact the handshake latency, and the size of the keys stored on 
devices and transmitted during the handshake. 

We compare three of the NIST Round 3 finalist KEMs that rely on hard- 
ness assumptions over structured lattices and achieve good performance in terms 
of speed and size. These are Kyber512 [37] with security relying on the Module 
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Learning with Errors (MLWE) problem, LightSABER [10] relying on the Module 
Learning with Rounding (MLWR) problem and NTRU-HPS-2048-509 [9] with 
security relying on the NTRU problem. We also include Round 3 Alternate can- 
didate SIKE [24] which is based on supersingular isogeny Diffie-Hellman, using 
parameter set SIKEp434-compressed. Despite slower operations compared to its 
lattice-based counterparts, SIKE benefits from the smallest key and ciphertext 
sizes of remaining candidates in the NIST process. 

To verify (client) certificates, we combined these KEMs with Dilithium-II [31] 
(with Kyber512 and LightSABER) and Falcon512 [34] (with NTRU) based on 
the similar assumptions. For the smallest size instantiation based on SIKE, we 
used Falcon, which has the smallest signatures of the finalists. 


Prototype Implementation. To experimentally evaluate PDK-SS, we implemented 
it by modifying the prototype implementation of the KEMTLS protocols [38,39] 
based on Rustls [7], a TLS library written in Rust. The prototype integrates 
implementations of the post-quantum primitives from PQClean [25] and the 
Open Quantum Safe (OQS) library [42]. For all implementations we used AVX2- 
accelerated code. The implementation is available under permissive licences at 
https://github.com/AbuLSim/1RTT-KEMTLS. 


8 Benchmarking 


Table 1 compares the main protocol with other mutually authenticated hand- 
shake protocols, some of which also leverage cached leaf certificates. Even though 
these experiments were run on a powerful server and not on IoT devices, they 
clearly demonstrate the performance benefits of the main protocol. 


Methodology. We compare PDK-SS to TLS with cached certificates [36] (both 
TLS 1.3 using X25519/RSA2048 and post-quantum variants), and to KEMTLS, 
with [39] and without [38] pre-distributed keys (the former is denoted PDK in 
Table 1). Cached TLS is included for the sake of comparison to a real-world 
Internet protocol. 

We analyze the performance of the PDK-SS protocol in three cases: 


— the synchronized case PDK-SS, where the client and server share the same 
semi-static key; 

— the asychronized case PDK-SS async, where the client and server have out-of- 
sync copies of the semi-static key and so the server must send its key to the 
client; 

— the PDK-SS update case, where the client and server share the same semi- 
static server key but an update to the next semi-static epoch key is available. 


The numbers in each column of Table 1 represent the average time to reach 
the corresponding stage of the protocol, measured in milliseconds over 60,000 
handshakes for each scheme and each set of network parameters. The handshakes 
were performed on an emulated network; the experiment code is included in the 
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Table 1. Average time in ms for mutually authenticated handshakes with cached leaf 
certificates. 


Mutually 30.9ms RTT, 1000 Mbps 195.5ms RTT, 10 Mbps 
authenticated Client Client Server Server Client Client Server Server 
send req. recv. resp. expl. recv send req. recv. resp. expl. recv 
auth. CFIN. auth. CFIN. 
, SIKE-c 196.8 228.0 228.0 165.9 697.0 893.3 893.2 500.9 
F MLWE/MSIS 95.0 126.2 126.2 64.1 598.1 794.2 794.2 401.6 
E NTRU 95.1 126.3 126.2 64.2 594.8 791.0 790.9 398.4 
a TLS 1.3 68.8 100.3 66.0 38.2 399.2 596.6 396.5 204.6 
E SIKE-c 103.0 134.8 101.6 72.8 431.7 630.5 430.3 238.1 
3 MLWE/MSIS 64.3 95.9 63.7 33.8 400.3 619.4 399.7 224.7 
° NTRU 66.0 97.8 64.6 35.7 397.9 596.7 396.5 204.2 
SIKE-c 130.6 161.7 130.5 99.7 466.6 662.7 466.5 269.3 
y Kyber 63.3 94.4 63.2 32.3 400.5 596.5 400.4 200.6 
= NTRU 63.3 94.5 63.3 32.4 396.7 592.7 396.6 198.8 
SABER 63.4 94.5 63.3 32.5 399.3 595.3 399.2 200.4 
SIKE-c 126.8 157.8 126.7 91.9 474.1 670.2 474.0 276.5 
3 Kyber 63.5 94.6 63.4 32.5 402.0 598.3 401.9 201.5 
= NTRU 63.5 94.7 63.5 32.6 397.6 593.6 397.5 199.4 
SABER 63.6 94.7 63.5 32.7 401.5 597.7 401.5 201.1 
M SIKE-c 170.6 201.7 170.6 129.7 672.6 868.7 672.5 475.1 
3 Kyber 94.7 125.9 94.7 63.8 614.7 810.8 614.7 403.0 
š NTRU 94.8 125.9 94.7 63.8 597.5 793.5 597.5 398.0 
SABER 94.9 126.0 94.8 63.9 604.0 800.0 603.9 401.1 
2 SIKE-c 127.5 158.5 127.4 92.5 474.1 670.2 474.0 276.5 
3 Kyber 63.5 94.7 63.5 32.6 402.1 598.4 402.0 202.2 
š NTRU 63.6 94.7 63.5 32.6 398.1 594.1 398.1 200.0 
7 SABER 63.7 94.8 63.6 32.7 401.5 597.7 401.5 201.7 


source code repository. The server running the simulations was equipped with 
two Intel Xeon Gold 6230 CPUs, each with 20 cores. The left hand columns 
were computed over a low-latency, high-bandwidth (30.9ms round trip and 
1000 Mbps) connection, with the right hand over a high-latency, low-bandwidth 
(195.5 ms round trip and 10 Mbps) connection. For each handshake, we measured 
the time taken for the client to send its first request in the form of application 
data, the client to receive the server response, the server to be explicitly authen- 
ticated, and finally the server to receive the client finished message. The time 
taken for the server to be explicitly authenticated is in bold font as we view it 
as the most important metric for our use case. 


Analysis. Table 1 shows that the performances of PDK-SS (in the synchronized 
case), PDK and cached TLS are similar. That is because they are all 1-RTT, and 
the handshake time is dominated by the number of round trips since computation 
and transmission times are dwarfed by the network latency. The only exception 
is with SIKE as KEM, as its operations are an order of magnitude (milliseconds 
versus microseconds) slower than those of the other KEMs. 
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As for the asynchronized case, PDK-SS async compares most closely with 
the original KEMTLS handshake (PDK-SS async is somewhat faster as clients 
do not verify server certificates); their additional round trip clearly impacts the 
overall handshake time as expected. More precisely, PDK-SS is 50 to 55% faster 
than KEMTLS in the low-latency setup, and 50 to 53% faster in the high-latency 
setup. 

Overall, our experiments confirm that the privacy benefits of introducing 
semi-static keys come at a negligible performance cost. 
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Abstract. Onion services enable bidirectional anonymity for parties 
that communicate over the Tor network, thus providing improved pri- 
vacy properties compared to standard TLS connections. Since these ser- 
vices are designed to support server-side anonymity, the entry points 
for these services shuffle across the Tor network periodically. In order 
to connect to an onion service at a given time, the client has to resolve 
the .onion address for the service, which requires querying volunteer 
Tor nodes called Hidden Service Directories (HSDirs). However, previous 
work has shown that these nodes may be untrustworthy, and can learn 
or leak the metadata about which onion services are being accessed. In 
this paper, we present a new class of attacks that can be performed by 
malicious HSDirs against the current generation (v3) of onion services. 
These attacks target the unlinkability of onion services, allowing some 
services to be tracked over time. 

To restore unlinkability, we propose a number of concrete designs that 
use Private Information Retrieval (PIR) to hide information about which 
service is being queried, even from the HSDirs themselves. We examine 
the three major classes of PIR schemes, and analyze their performance, 
security, and how they fit into Tor in this context. We provide and eval- 
uate implementations and end-to-end integrations, and make concrete 
suggestions to show how these schemes could be used in Tor to minimize 
the negative impact on performance while providing the most security. 


Keywords: Tor - Onion Services - Unlinkability - PIR 


1 Introduction 


Tor provides anonymity to millions of users accessing the Internet every day [28]. 
However, Tor can also be used to provide this same protection to hosts of content, 
resulting in bidirectional anonymity (or pseudonymity). This is achieved through 
the use of Tor onion services! [11]. Communication over Tor is done using Tor 
circuits. A circuit is a chain of typically three relay nodes through which traffic, 


1 Onion services were originally called hidden services, and hence some of the related 
nomenclature still uses the word “hidden” instead of “onion” (for example, “HSDir” ). 


An extended version of this paper is available [12]. 
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encrypted in layers, is sent. The first node in a circuit is a client’s guard node. In 
order to prevent certain attacks, a client will try to use the same guard node for 
every circuit it builds over the course of several months [10]. In order to commu- 
nicate with an onion service, the client must learn the location of a Tor relay that 
has an open circuit to the onion service, called an introduction point. Each onion 
service typically has multiple introduction points distributed across the Tor net- 
work, which maintain circuits that connect to the onion service. Clients use these 
introduction points to inform the onion service of a rendezvous point that the onion 
service and client can communicate through via Tor circuits. In order to start this 
process the client must first obtain a list of the introduction points an onion service 
uses. This is done by querying the hidden service directories, or HSDirs. HSDirs 
assist the client in the task of translating the . onion address of an onion service 
into its list of introduction points. Onion addresses are encodings of the long-term 
identity public key owned by the onion service. For example, an onion address looks 
like: vww6ybal4bd7szmgncyruucpgfkqahzddi37ktceo3ah7ngmcopnpyyd.onion. 
How this address is distributed to users varies according to the onion service in 
question. 

In version 3 of the onion services protocol, which we focus on in this paper, the 
onion address is used to query the HSDirs by first translating the original public 
key to a new ‘blinded’ public key, which changes at a regular interval (currently 
one day). The client uses the blinded public key to query the HSDirs, who 
provide the client with the descriptor associated with that blinded public key. 
These descriptors are encrypted under a symmetric key that can be derived from 
the onion address and contain a list of the introduction points. Each descriptor 
is held by a pseudorandom subset of Tor relays with the HSDir flag enabled. 
The mapping of which relays hold which descriptors changes over time, and is 
determined by a variety of inputs and system parameters, including the blinded 
public key of the onion service, the identity of the relay, and a shared random 
value distributed across the network; this shared randomness makes it hard for a 
malicious adversary that intends to censor an onion service to a priori compute 
and target the relays that will be used for serving the descriptors of an onion 
service in the future. We describe this process in detail in Sect. 2.1. 

Tor has deprecated version 2 of the onion service protocol as of July 2021. 
An explicit goal of version 3 of the onion services protocol is that it should be 
difficult for the HSDirs to know i) which onion services they hold descriptors 
for, and ii) which onion services are being accessed when they are queried [32]. 
In version 2 of the protocol, the permanent public key associated with the onion 
service was contained within the descriptor, allowing HSDirs to identify the onion 
services they hold descriptors for. To protect against this, in version 3 the HSDirs 
hold descriptors indexed by a blinded public key, and since the identity public 
key (the onion address) cannot be recovered from the blinded key, the HSDirs 
cannot link an onion service descriptor with the underlying onion service unless 
it already knows the identity public key for the onion service. This process of 
keyblinding provides the security property of “unlinkability”, which states that 
for an adversary who only observes blinded public keys and signatures under 
those keys, it is cryptographically impossible to pair two blinded keys as having 
the same underlying identity key. 
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However, in many cases it is reasonable to expect that the HSDir may know 
the identity public key. For example, many onion services widely distribute their 
.onion address so that anyone can access them, or a malicious adversary trying 
to deanonymize an onion service may get the public keys in some other way. 
Any HSDir can check if they hold the descriptor for an onion service that they 
know the identity public key for, simply by deriving the blinded public key 
and checking the descriptors they hold. In this work we consider how an HSDir’s 
ability to connect incoming descriptor queries to blinded public keys impacts the 
privacy of onion services in Tor. We find that the information HSDirs have access 
to puts them in an advantageous position for launching attacks that can harm 
the anonymity of both onion services and clients connecting to those services. In 
particular we find that for onion services that wish to remain unknown, but are 
relatively popular within a community, it may be possible to violate unlinkability. 
To prevent this source of information being available, we explore the integration 
of Private Information Retrieval (PIR) into the descriptor lookup process. 


1.1 Related Work 


The idea of using PIR to mask the relative popularity of onion service queries 
from (version 2) HSDirs was mentioned in a blog post by Kadianakis in 2013 [18]. 
This blog post outlined various deficiencies in the way onion services worked (at 
the time), as well as proposing research directions for the scientific community 
to investigate to address these problems. In particular, while the post suggested 
using PIR as a possible solution to this problem, it did not investigate what 
kinds of PIR would be ideal or propose how the PIR schemes would actually be 
integrated into Tor. Ours is the first work that characterizes attacks against the 
newer version 3 onion services, explores the design space of a PIR-based solution 
to address it, and provides an implementation to demonstrate the effectiveness 
of those designs. 

Other integrations of PIR into Tor have been explored before [23,26], but 
the focus in those works was on using PIR for finding nodes to build circuits, 
and not for onion service queries, although the two do share similar interests. 
Consideration of the possibility and the ramifications of malicious HSDirs has 
been addressed in some works before [16,22,25], but no work has yet explored 
whether knowledge of the distribution of queries made to an HSDir could be 
used to aid attacks that deanonymize clients or onion services. 


Our Contributions. 


1. We provide a description of the v3 onion service lookup process, which is key 
to the remainder of the text in Sect. 2. We then analyze the leakage induced 
by the descriptor lookup mechanism of v3 onion services, and propose attacks 
targeting both clients and onion services that leverage this leakage. 

2. In Sect. 3 we discuss the variants of PIR that could solve this problem, and 
address the challenges of integrating them into a complete end-to-end solution 
for Tor’s onion services, while also highlighting a network-level optimization 
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that saves an additional network round-trip that PIR would otherwise intro- 
duce. 

3. We analyze the different PIR schemes proposed to compare the privacy guar- 
antees provided by each in the context of Tor, using enumerative techniques 
to provide an upper bound on the probability an adversary is able to com- 
promise our multi-server PIR system in Sect. 4. 

4. Finally, in Sect.5 we provide microbenchmarks for all of the defences we pro- 
pose. Additionally, we also implement and evaluate an end-to-end integration 
for the best solution from our microbenchmarks on the live Tor network. Our 
results demonstrate that the performance overhead (or effect on time to load 
an onion service for clients) of our proposed defence is negligible. 


2 Attacks 


In this section we describe various avenues of attack enabled by the metadata 
currently provided by query lookups. We start with an overview of the lookup 
process, so that we can motivate our adversarial model and explain the attacks. 
Next we consider two broad classes of attacks: those targeting clients, and those 
targeting hidden services. For each class, we argue how a malicious HSDir may 
leverage the information of the distribution of query lookups to gain information 
on a target or track them over time. 


2.1 Tor and Hidden Service Directories 


The Tor network is run by thousands of volunteer nodes, called relays. As of 
January 2022, there are roughly 7000 relays that forward traffic for the Tor net- 
work [28]. These relays are listed in the Tor network consensus, which is a docu- 
ment generated by the nine Tor network directory authorities once per hour [29]. 
This consensus lists some global parameters conveying information about how 
Tor clients and relays should behave, and each relay listed in the consensus can 
also have flags indicating what properties the relay has. Currently, roughly 4000 
relays have the “HSDir” flag, indicating that the relay will hold descriptors for 
onion services and deliver them to clients. Time is divided into epochs, with the 
size of the epoch being a consensus parameter of the Tor network. Currently, 
the length of an epoch is one day. 

The HSDirs collectively store the onion service descriptors in a distributed 
hash table. Each epoch, each node has a separate index value, denoted 
hsdir_index. These indices are unpredictable and uncontrollable by the HSDirs. 
By ordering these indices, and looping back at the end, we can form a ring of the 
HSDirs. For redundancy, each descriptor is held by multiple HSDirs. To deter- 
mine which HSDirs hold which descriptors, we can calculate hs_index,; values 
for the onion service, where i ranges from 1 to hsdir_n_replicas (a param- 
eter given in the consensus, currently 2). These hs_index; values are deter- 
mined by the blinded public key of the onion service for that epoch, as well 
as the index 7 and a few consensus parameters. The descriptor is uploaded to 
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the hsdir_spread_store (currently 4) HSDirs whose hsdir_index values come 
directly after the hs_index; values in the HSDir ring. In this way, the descriptor 
is replicated across the hash ring multiple times (currently 8) in each epoch, so 
that a client wishing to access the descriptor has a variety of HSDirs that may 
be queried, improving the privacy and availability of the onion service. 

The Tor metrics site [28] provides some sense of the current scale of onion 
service usage. It reports around 4000 HSDirs (Tor nodes with the HSDir flag) in 
the network. Since their deprecation, the number of v2 onion services has dwin- 
dled to around 25 thousand, while the number of v3 onion services has steadily 
increased since tracking began (in September 2021), to around 700 thousand 
today. Given the number of HSDirs, the number of unique services, and the 
number of times a descriptor is replicated, we arrive at a rough estimate of 
700000 x 8/4000 = 1400 descriptors on average per HSDir. 

For the remainder of this section, we want to consider the capabilities of a 
malicious entity willing to act as an HSDir. To model this, we consider that the 
Tor network has n relays with the HSDir flag. We envision that our adversary 
controls a of these relays. This adversary can see all incoming HS lookup queries 
on the HSDirs that they control. As we will see, even with this simple model, the 
adversary can draw conclusions and make inferences about both clients and onion 
services that go beyond what Tor allows for from other nodes in the network. 


2.2 Attacks Targeting Clients 


An adversary hoping to deanonymize a Tor user who uses onion services is placed 
in a relatively powerful position in the network. When a client resolves an onion 
service descriptor lookup, they connect to the HSDir via a circuit. Hence if an 
adversary in addition to controlling the HSDir, controls even the middle node 
of this circuit to the HSDir, they learn both the client’s guard relay and the 
blinded public key of the service being connected to.? 

For a service that widely distributes their .onion address, this gives the 
adversary the client’s entry point to the network (the guard relay) and their 
final destination (the onion service). As clients maintain their guard node for a 
long period of time (currently up to six months) [13,30], the guard relay itself 
provides substantial information that can allow a malicious actor to trace a client 
over time. Combined with the information of the final destination, this can lead 
to powerful epistemic attacks [7,8]. 


? Of course if the adversary controls the guard relay, they have an even stronger 
attack vector, and will learn the client’s true IP address. However it is easier for an 
adversary to control middle nodes, since attaining guard status for a relay requires 
uptime on the order of several weeks, and clients do not often select new guards. 
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2.3 Attacks Targeting Onion Services 


The introduction of blinded public keys in version 3 of the onion services pro- 
tocol intended to provide better anonymity properties for onion services against 
malicious HSDirs. Blinded public keys cannot be traced back to the identity 
public key, and the blinded public key changes with each epoch; therefore, in 
theory onion services cannot be tracked by HSDirs across epochs. Cryptograph- 
ically, this is formulated as unlinkability, which states that after observing many 
public keys and many signatures under those public keys, an adversary cannot 
do better than guessing to link two blinded public keys as being derived from 
the same identity key. In this subsection, we argue that while the cryptography 
used for blinded key schemes is solid, these guarantees do not extend to all onion 
services in practice because of how descriptor lookups are resolved. 

To track an onion service over time, an HSDir can consider the distribution 
of queries made to each service over time. Different services are likely to have 
radically different distributions of queries. By identifying two blinded public keys 
that received a similar distribution of queries over the course of an epoch, an 
adversary can ascertain with a reasonable degree of confidence that the two 
blinded keys correspond to the same identity public key. The challenge in this 
setting is that the database is distributed, and hence the adversary’s view is 
limited to a fraction of the total set of queries made within an epoch. This 
fraction is defined by the adversarial power a and the total number of nodes 
n. This notion of building a ‘profile’ for an onion service based on the query 
distribution is simply the starting point of our attack, and we will refer to it as the 
weak variant of the attack. A truly malicious adversary has several other sources 
of information available to them that strengthens the profiles constructed. For a 
given onion service, additional sources of metadata that a malicious HSDir could 
leverage include (i) the set of guard nodes that make the HSDir lookup requests, 
(ii) the frequency distribution of lookup requests from the aforementioned set of 
guard nodes, or (iii) the timings of lookup requests within an epoch. 

There are other information channels possibly available to an adversary as 
well, such as considering correlations between the timing of queries to cross- 
linked onion services. Nonetheless, the common underlying element that makes 
the attack feasible is the ability of an adversarial HSDir to infer which of its 
onion service descriptors is being looked up in a request, allowing them to link 
the metadata of the request to that particular onion service. The solutions that 
we propose in Sect.3 will prevent these attacks we outlined. 


3 PIR for Descriptor Lookups 


To prevent the kinds of attacks established in Sect. 2, we need to prevent mali- 
cious HSDirs from learning which descriptor is being queried by a user. As a 
general approach, the obvious tool for this requirement is Private Information 
Retrieval (PIR). However, there is a large research gap between the simple idea of 
using PIR and its actual integration into the descriptor lookup process. We con- 
sider the three approaches of multi-server PIR, single-server PIR using computa- 
tional assumptions (CPIR), and single-server PIR using hardware assumptions. 
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For the remainder of the paper, we show how these different PIR schemes can 
be integrated into Tor to prevent the statistical attacks we have shown. Single- 
server PIR does not change how clients decide which HSDir to query from. The 
structure of the hash ring, how the client decides where to query from, and the 
logic the onion service uses to decide where to upload can all stay the same, 
while multi-server PIR does introduce significant changes to this structure. We 
provide a complete system design for how to integrate each of these into Tor, 
and explore the advantages and disadvantages of each approach. Integrating a 
PIR scheme into any application context poses several challenges of bridging the 
rigid semantics of a PIR scheme with the underlying architecture: 


1. Commonly, records in PIR schemes are stored as logical arrays and are 
indexed by an integer; in order to retrieve a particular record the client has 
to query for the corresponding index of this record. However in our context 
of onion service descriptors, the data to be queried privately is a key-value 
store, and we discuss later in this section how to bridge this gap. 

2. PIR schemes are computationally expensive and hence in order to ensure that 
integrating PIR guarantees into queries does not impact the performance 
of the entire application we need to ensure that PIR queries are handled 
asynchronously. In the extended version of this paper [12, App. A] we give 
the technical details of how we extend Tor’s program architecture to support 
asynchronous PIR queries on both the client and server side. 

3. In order to even construct a PIR query, the client needs to a priori know the 
parameters of the PIR scheme it is interacting with; this will induce a per- 
formance penalty of an additional round trip of communication for learning 
those parameters. Note that the delay introduced by the additional round trip 
directly impacts the time for an onion service’s page to start rendering, which 
is an important user-experience metric to minimize. We provide an optional 
optimization that can remove this additional round trip in the extended ver- 
sion [12, App. B] by leveraging the fact that these parameters can be publicly 
published. 


Single-server PIR. In a single-server scheme, a client queries a database by 
sending an encrypted version of the index that they want to retrieve data for. The 
server performs some computation over the database, and returns an encrypted 
response without learning any information about the index. This goal can be 
accomplished either by using encryption with strong mathematical properties 
like fully homomorphic encryption, as in XPIR [1] or SealPIR [2], or by using 
secure hardware, as in ZeroTrace [27]. 


Multi-server PIR. Multi-server schemes instead have the database distributed 
across several servers. The client must query these servers to obtain some data 
that can be recombined to obtain the query result. These schemes rely on non- 
collusion assumptions; if the queried servers collude with each other, they can 
determine what a client has queried. In a multi-server PIR scheme, there are 
£ servers that can be queried. Each holds the same data, indexed in the same 
way. Clients query each one with a separate query, and can then recombine the 
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responses to extract the desired data. In simple PIR schemes, such as Chor’s [6], 
the client only succeeds if each of the £ servers responds correctly. However, more 
robust schemes, such as those of Goldberg [15] and of Devet et al. [9], have since 
been developed that generalise this. In these schemes, only k of the £ servers 
need respond, and up to v of the servers may deviate from the protocol, and the 
client will still be successful in extracting their desired query. 

We next sketch a design for a process to distribute a descriptor across £ servers 
so that multi-server PIR schemes are possible. Our system needs to allow for a 
single database to be distributed (identically) across multiple servers. When a 
client wishes to query for a descriptor, they must be able to figure out which 
servers can serve their query. We calculate the hsdir_index,; values as in the 
current Tor specification; however, previously these values pointed to the start 
of a sequence of hsdir_spread_store nodes, any of which could be queried for 
the desired descriptor. To support &server PIR, these will instead point to the 
start of the sequence of £ servers that will be used for the protocol. 

When a circuit is constructed in Tor, basic precautions are taken to try and 
ensure that the routers in the path actually represent distinct, non-colluding 
entities. Specifically, the Tor Path Specification [31] outlines several constraints 
for building a path, which include the following: 


— If two routers list each other in the “family” entry of their descriptors, they 
are in the same family and should not be in the same path. 

— Two routers in the same /16 subnet should not be in the same path. 

— Non-running and non-valid routers should not be in a path. 


We can employ the same principles for path selection for the purpose of choosing 
the £ servers for multi-server PIR. Of course, any malicious adversary can simply 
ensure that routers they control do not list each other and are not in the same 
/16 subnet. These restrictions do not stop such adversaries, but take precautions 
to prevent incidental collusion by routers. We can take the exact same approach 
for choosing the HSDirs that will process a multi-server PIR query. 

When a client wants to fetch a descriptor using multi-server PIR, it chooses a 
random index 7 from {1,...,hsdir_n_replicas} and computes hsdir_index; for 
the hidden service. It then locates the £ next valid HSDirs who are all in different 
families and /16 subnets whose dir_index values come after hsdir_index;. The 
client can then engage in the multi-server PIR protocol with these £ servers. If 
this protocol fails for any reason, such as too many of the servers being unavail- 
able, the client simply selects a new i and tries again.’ 

HSDirs keep their collection of descriptors separated into logical databases, 
according to their position in the sequence of £ servers in the hash ring; that is, 
the ‘one’ database held by an HSDir is identical to the ‘two’ database held by 


3 Currently, the number of times the descriptor is replicated (and thus the number 
of places it can be accessed from) is hsdir_n_replicas times hsdir_spread_store, 
which is currently 8. To ensure that there are the same number of logical databases 
where a descriptor can be accessed from, hsdir_n_replicas would have to be 
increased. 


Improving the Privacy of Tor Onion Services 281 


the next server, and so on. When a client makes their PIR query to each server, 
they must also indicate which logical database they are querying from, so that 
each database they query from is the same. 

Note that for PIR schemes, it is crucial that all of the databases distributed 
across the @ servers are identical. Ensuring the consistency of replicated data 
across servers is of course a fundamental problem in the study of distributed 
systems. For this reason, specific solutions to the problem are largely orthogonal 
to this work. However, in the extended version [12, App. C.1] we outline some 
of the general approaches that can be used in Tor in order to address this issue. 

Perfect Hashing. In most PIR schemes, clients look up a particular index 
in a database without revealing that index to the server [6,21]. Chor et al. [5] 
propose a number of ways to use a PIR scheme such that clients can look up 
records by a string instead of a record index. One of their techniques uses a 
construction called perfect hashing. Given a set of D keys (which are arbitrary 
strings), a perfect hash function (PHF) maps these D keys injectively into inte- 
gers in the range [0,...,—1], where r = c- D and c is a small constant, typically 
in the range of 1 to 2. In our application, we choose to maintain a small c so as to 
maintain a smaller-sized PIR database. Ideally, we would like c = 1, which results 
in a variant of perfect hash functions known as minimal perfect hash functions 
(MPHF). The information needed to evaluate an MPHF requires slightly more 
bits per key (D) to describe than a general PHF, but in our context this results 
in us being able to maintain a smaller PIR database size, which would intuitively 
result in overall gain. Therefore an MPHF seems like the ideal solution to our 
indexing challenge. 

We discuss the details of the MPHF we use, provide benchmarks for it, as 
well as discuss why this does not entail any concerning leakages in the extended 
version |12, App. D]. Later in Sect. 5.1 we give optimizations to resolve indexing 
in hardware-assisted PIR schemes without the use of MPHFs, thus eliminating 
these leakages entirely in the hardware-assisted PIR case. 


4 Privacy Analysis for PIR Schemes 


To evaluate candidate PIR schemes, we must discuss what underlying assump- 
tions give a scheme its privacy properties, and how these assumptions hold up 
in Tor. The Tor context in no way affects schemes that make only computa- 
tional assumptions, This means that single-server computational PIR schemes 
can be trusted to the extent that the underlying cryptographic assumptions are 
trusted. For XPIR and Seal-PIR, this corresponds to the Learning With Errors 
assumption, widely believed to be secure by cryptographers. The problem has 
received increased scrutiny and cryptanalysis due to post-quantum cryptography 
standardization efforts by NIST [24] and other standardizing bodies. 

For hardware-based schemes, privacy guarantees depend on the security of 
trusted enclaves. In our implementation we leverage Intel SGX as the secure 
hardware module for our hardware-aided PIR scheme. Trusting the hardware in 
this case boils down to being able to verify that an HSDir that claims to support 
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hardware-aided PIR does in fact run on a processor with such hardware prowess. 
In Intel SGX, this is done via remote attestation [17], towards which Intel issues 
certificates that validate the claims made by its processors with SGX support. 
In the event of Intel “going rogue”, they can at most misissue future certifi- 
cates. This would not affect the security of PIR lookups that happened before 
misissuance. Additionally, we implicitly trust these modules to deliver the con- 
fidentiality and integrity guarantees they claim. However, recently researchers 
have demonstrated side-channel vulnerabilities of SGX that attack its confiden- 
tiality. These works have also demonstrated defences which are actively being 
incorporated by Intel, and this is a natural part of a new hardware component’s 
lifecycle. In both of these classes of trust violations, our design still has forward 
secrecy in that the privacy of past queries cannot be compromised by future vio- 
lations. Furthermore, even in such a worst-case event, the security of our scheme 
would simply reduce to the current status quo. We also note that while we used 
SGX to prototype our work, the underlying techniques can be adapted onto any 
of the other existing processors with secure hardware capabilities such as ARM 
TrustZone [3], AMD SEV [19], or their open-source sibling Keystone [20]. 

The privacy guarantees provided by multi-server ITPIR schemes are based 
on non-collusion assumptions made about the servers involved in servicing the 
queries. To guarantee the privacy of a query made by a client, we must assume 
that the £ servers involved in the query do not collude to break the privacy of this 
query. For non-robust schemes like Chor et al.’s, this assumption holds as long as 
at least one server does not collude to break a client’s privacy. For robust schemes 
like Goldberg’s [15], this is generalised so that as long as no more than t servers 
collude, privacy is still guaranteed. To analyze this assumption, we imagine an 
adversary who controls a Tor HSDirs, and then ask various questions about the 
probability they are able to break the non-collusion assumption. Remember that 
the position of a HSDir in the hash ring is determined by random values that 
the nodes have no control over, so that an adversary cannot adaptively position 
themselves in a hash ring to compromise security. 

In addition to analyzing the probability an adversary can compromise pri- 
vacy, we need to analyze the robustness of these schemes and the probability 
that the adversary can disrupt the availability of descriptors by behaving in a 
malicious way. In this section we will present our analysis on privacy compro- 
mise of queries, and in the extended version [12, App. C] we provide an in-depth 
analysis of availability compromise in the multi-server ITPIR model. With a 
multi-server scheme, the privacy can be compromised if the adversary is able to 
control at least t+ 1 out of the £ servers involved in the PIR query (with the 
value of t depending on the particular scheme). In order to evaluate how much 
more difficult this makes the adversary’s task, we assume the adversary controls 
some number a of the n Tor HSDirs overall. We then ask the probability that 
when the hash ring is constructed, there is a consecutive sequence of £ servers in 
the hash ring (an “¢-block”) where the adversary controls at least t+ 1 of them. 

We can estimate how often this may happen using a combination of exper- 
imental and enumerative techniques. An exact enumeration is a somewhat 
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Fig. 1. Comparing our provided upper bound with experimental results for the proba- 
bility an adversary is able to control any -block and violate query privacy in an ITPIR 
setting. Here n = 4000, £ = 5, and t = 2. Experiments were performed by simulating 
100,000 hash rings independently. Again recall that in the current Tor setup without 
PIR, this query privacy is always violated. 


challenging combinatorial problem, but we can establish an upper bound. We 
provide an upper bound for the number of configurations of a hash ring with n 
nodes, of which a subset of size a are controlled by an adversary, such that at 
least one set of consecutive £ nodes contains at least t + 1 adversarial nodes: 


URRE TETT (“7 ') (") cc ey 


This equation does not perfectly enumerate the number of hash rings where 
the adversary controls an ¢-block. It overcounts this number of hash rings, 
because hash rings where the adversary controls multiple blocks are counted 
once per block. However, since it strictly overcounts, the equation can be used 
as an upper bound. We leave a tighter bound as future work. 


Lemma 1. The number of hash rings of size n in which an adversary controlling 
a nodes controls at least t+1 of a sequence of £ consecutive nodes is upper bounded 
by U(n,a, é,t). 


The combinatorial proof can be found in the extended version [12, App. E]. 
We can upper bound the probability that an adversary can compromise at least 
one database by dividing U (n,a, £, t) by (n—1)!, the total number of hash rings 
on n nodes. To see how this upper bound compares to the actual probability, we 
perform a series of experiments, varying the number of adversaries in the hash 
ring and observing the frequency with which the adversaries control an ¢-block. 
With n = 4000, £ = 5, and t = 2, we varied the number of adversaries from 0 
to 120. Our results are shown in Fig.1. Notably, the lower a is, the better our 
upper bound performs compared to the true probability. This is not surprising, 
as our upper bound overcounts hash rings where the adversary controls multiple 
é-blocks, which occurs more frequently when a is higher. 
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5 Benchmarking and Results 


In this section, we discuss evaluations of selected PIR schemes we proposed 
earlier in Sect. 3. All our microbenchmarks are run on a single server-grade Intel 
Xeon E3-1270, with four physical cores, 64GB of DDR4 RAM, and support for 
Intel SGX. Our server machine runs Ubuntu 16.04, and all our experimental 
results for the systems we measure are for a single core without any parallelism. 
For the end-to-end integration experiments we reuse the same server as the 
HSDir node, and use a 1.8 GHz i7-8565U laptop as the client. The source code 
to reproduce our experiments is available on our website.* 

In order to get a more complete picture of how many lookup requests an 
HSDir handles at a time, as well as the sizes those descriptors, we monitored 
the activity on an HSDir for around eight months.” In terms of sizes, about 
81% of the v3 descriptors we observed were <16 KiB, 7% were between 16 and 
32 KiB, and 12% between 32 and 48 KiB. Hence for all of the PIR schemes we 
evaluate, we are concerned with large record sizes (approximately 16 KiB) since 
hidden service descriptors are about that size. Larger v3 descriptors correspond 
to onion services that have a large encrypted list of authorized clients. To avoid 
leaking whether, and how many, authorized clients there are, hidden services may 
add fake lines to the descriptor to pad the length [32]. For microbenchmarks on 
multi-server PIR, we refer to Devet et al. [9], which provides a comprehensive 
picture on implementation details of many different configurations of multi-server 
PIR. Multi-server PIR schemes are fast, and we do not foresee the performance 
of these schemes being a bottleneck. 


5.1 Hardware-Assisted PIR Benchmarks 


For hardware-assisted PIR schemes, we leverage ZeroTrace and benchmark four 
different variants of PIR flavours using it. Specifically, two variants of linear scan 
(one where the data is stored in the Processor Reserved Memory (PRM) pages, 
and the other where it is stored outside the PRM), Path ORAM, and Circuit 
ORAM. The linear scan variants do not face the indexing challenge, unlike the 
other PIR schemes. However, Circuit ORAM and Path ORAM do have the 
indexing problem to address. Instead of using an MPHF however, notice that 
in this context, indexing for the ORAM scheme can be achieved more simply 
by performing a linear scan over an array that maps blinded public keys of 
the hidden service descriptors to indices in the ORAM scheme. The overheads 
induced by this linear scan is minimal since each record in this array is a 32- 
byte key and an 8-byte index, and is significantly faster than scanning the entire 


4 https://crysp.uwaterloo.ca/software/piros/. 

5 For privacy reasons, we only gathered bucketized numbers and sizes of descriptors, 
and never the actual descriptors themselves. In the extended version [12, App. F] 
we give more information on how we followed Tor research safety guidelines. 

6 The smallest v3 descriptors are 14200 bytes; descriptors of this size are the over- 
whelming majority of descriptors in the < 16 KiB pool. 
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Fig. 2. Detailed microbenchmarks evaluating server computation time, client compu- 
tation time, and total bandwidth overheads as a function of the number of descriptors 
held by the server. Trivial PIR requires no computation and is hence omitted from (a). 
For (b) and (c) we only display the line corresponding to ZeroTrace’s Linear Scan with 
data stored outside PRM as representative of all the ZeroTrace variants since the client 
computation and bandwidth overheads of all the ZeroTrace variants are identical. 


collection of hidden service descriptors. Our microbenchmarks for both ORAMs 
are hence inclusive of this online cost of index resolution via linear scanning. 

We note that the PIR parameters for such a hardware-assisted ORAM scheme 
is simply a public key, under which the queries are encrypted by a client, such 
that only the SGX enclave can decrypt it. Hence these constructions also have the 
additional benefit that the tiny parameter size means they can forego the extra 
round trip required to fetch the PIR parameters, for example by simply including 
the public key in the Tor consensus directly. We observe later in Sect. 5.3 that 
doing so results in almost no perceivable overheads in the end-to-end latencies 
experienced by a user loading an onion service. 

Our microbenchmarks in Fig. 2a show that among the two linear scan vari- 
ants, storing the data outside of the PRM scales better. Although counterintu- 
itive, this arises from the fact that on Intel SGX, the PRM is limited to about 
90 MB and thus when the data to be stored crosses this threshold, it leads to sig- 
nificant overheads induced by page faults. From our rough estimate in Sect. 2.1, 
the total number of v3 onion services that an HSDir server holds today is close 
to 1500. Hence for the remainder of this section we compare the schemes at 
the datapoint of 1702.7 However, we note that both the linear scan variants of 
PIR in fact provide the best server computation time of 7.04+0.04 ms for 1702 
descriptors. In comparison, the ORAM schemes are slightly more computation- 
ally expensive as seen in the figure with server computation times of 33.2+0.4 ms 
and 14.2+0.5 ms for Path ORAM and Circuit ORAM respectively at the same 
number of descriptors; however, as the number of descriptors increases, they 


T For the microbenchmarks in Fig. 2 we chose data points evenly across the exponen- 
tially increasing x-axis, resulting in the odd data point of 1702 as closest to (but 
exceeding) 1500. 
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soon outperform the linear scan variants. In terms of the bandwidth overheads 
induced, all four of these schemes are optimal since they leverage secure hard- 
ware at the server side, thus allowing the queries and responses to simply be AES 
encryptions of the blinded public key and hidden service descriptor (padded to 
16 KiB) respectively. 


5.2 CPIR Microbenchmarks 


We show a detailed evaluation of XPIR and SealPIR in Fig.2 covering both 
computational and bandwidth overheads. In our experiments, we force XPIR 
to use LWE with 80-bit security, while tuning SealPIR’s parameters to do the 
same. We allow XPIR’s optimizer module to select the parameters d (recursion 
levels) and a (aggregation factor), for the best overall time for performing a 
PIR request. SealPIR’s implementation is currently limited to small data record 
sizes; specifically, the implementation expects that a single data record will fit 
into a single plaintext polynomial, which limits an individual data record size 
to 1.5 x N, where N is the degree of the underlying polynomial. (The 1.5 x N 
arises from the fact that with plaintext modulus t = 12 bits, a completely filled 
degree-N plaintext polynomial can store exactly 1.5 bytes per coefficient.) 

For large data records, one would store the data over multiple plaintext poly- 
nomials. Hence in our evaluations, we extrapolate SealPIR results by assuming 
that the costs of query processing and reply extraction will increase by a factor 
of K, where K is the number of plaintext polynomials required to store a single 
hidden service descriptor. To this end, we run our SealPIR experiments with a 
record size of 3000 bytes. (For N = 2048, 3072 B is the maximum data size that a 
plaintext polynomial can store.) The query processing time (excluding the time 
for expansion) and reply extraction are then multiplied by K = 6 to meet our 
required 16 KiB descriptor size. Figures 2a and 2b highlight the server compu- 
tation time and client computation time induced by these schemes respectively. 
SealPIR and XPIR scale computationally poorly at the server side as well as the 
client side due to the underlying computation overheads of the FHE schemes 
(FV [14] and BV [4] respectively) that they use. The XPIR and SealPIR points 
in our graphs show irregularities since we allow the optimizer to select opti- 
mal parameters for each problem size. For all of our experiments we note that 
the XPIR optimizer chose to not use recursion, but instead heavily aggregate 
data blocks using high values of a (in the range of 8 to 60). In order to choose 
the optimal recursion point for SealPIR for a given problem size, we evaluate 
SealPIR with both choices of d and present the one corresponding to minimum 
total time in our graphs. The break in SealPIR points in these graphs correspond 
to switching the number of recursion levels (d) from 1 to 2. 

At 1702 records of 16 KiB, XPIR induces a server computation overhead 
of 3143 ms, and SealPIR 451+1 ms. These overheads induced by these CPIR 
schemes are barely practical today and higher than their ZeroTrace counterparts 
by about an order of magnitude. However the client-side overheads at the same 
datapoint (158+12 ms and 13.32+0.03ms for XPIR and SealPIR respectively) 
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Fig. 3. Comparison of end-to-end latencies for connecting to an onion service with 
a plain Tor client vs. a Tor client with PIR support (with and without our optional 
optimization that saves the network round trip for fetching the PIR parameters). Our 
optimization saves approximately one second as we see from the difference in medi- 
ans for the two PIR modes across the data points we collected. Moreover, end-to-end 
latencies for a user are barely impacted by incorporating PIR as the overheads of PIR 
are hidden by the noise of Tor network costs. 


are more concerning as it may be much higher for lightweight clients like mobile 
users that would have weaker CPUs. 

Queries in XPIR are encryptions of a bit vector of length corresponding 
to the number of records stored at the PIR server, with the bit at the index 
being queried being 1 (and Os elsewhere). SealPIR introduces the notion of a 
compressed query, where the query is just an encryption of the queried index 
itself. In terms of reply extraction time, SealPIR and XPIR are quite close, and 
the difference in total client computation time arises from the fact that client 
query generation time is much smaller for SealPIR than XPIR, due to this query 
compression technique, but in total ZeroTrace is about two orders of magnitude 
better in terms of client computational overheads as seen in Fig. 2b. 

However this query compression technique has its own costs; first, it induces 
additional server-side computation for expanding this compressed query. Second, 
it limits the size of data that can be stored in a single underlying FV plaintext. 
Specifically, SealPIR has to force a plaintext modulus of t = 12 bits out of its 
coefficient modulus of q = 60 bits (as detailed by Angel et al. [2, §6] and seen in 
their implementation), so that after expansion and query processing the under- 
lying plaintext is still decryptable with very high probability. The impact of this 
is not obvious from SealPIR’s original evaluations, as they limit themselves to a 
small record size of 288 bytes, which fit within a single FV plaintext polynomial 
even with such low values of t. Ultimately this technique makes SealPIR per- 
form well in the context of large numbers of small records; however, this is the 
opposite of our context, where each HSDir has only a relatively small number of 
descriptors but of fairly large size, and this is reflected in our benchmarking. 

Finally in Fig. 2c, we see the total bandwidth overhead imposed by these 
schemes. Here we also include trivial PIR (clients download all descriptors when- 
ever they make a query) as a baseline to compare the proposed PIR, schemes 
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against. At the datapoint of 1702 hidden service descriptors we see that XPIR 
request and response sizes are around 7.5 MB and 2.5 MB respectively. 

While SealPIR is more bandwidth-viable than trivial PIR, it still requires 
about two orders of magnitude more bandwidth than any of the ZeroTrace coun- 
terparts. SealPIR alleviates the request size overhead using the aforementioned 
query compression technique. Thus both query generation time and size is signif- 
icantly smaller than that of XPIR, at the same 1702 hidden service descriptors 
mark the request and response sizes are around 64 KiB and 1.9 MiB® respectively. 


Concurrency and Computational Requirements. We note that both the 
CPIR schemes are parallelizable, and so are the linear scan variants of Zero- 
Trace, but not the ORAM counterparts. Hence multiple queries can be handled 
concurrently for these schemes. Even for the sequential ORAM schemes, mul- 
tiple cores on the machine can be used to run several instances of the ORAM 
enclaves allowing it to serve concurrent queries. 

We also collected the number of v2 and v3 lookups that our HSDir received 
during the months of April to June 2020. During this period, typically the HSDir 
served approximately 1500 queries in an hour. From the server computation 
microbenchmarks above and in the event that just a single core is used by the 
server to serve PIR, this would take less than 10.6s in an hour to serve all these 
queries using the linear scan variant of ZeroTrace, while the XPIR and SealPIR 
require close to 46s and 11.3min respectively to serve 1500 requests. However, 
some days we note spikes up to a maximum of 766,000 requests in an hour,’ at 
which point the load cannot be supported by a single core alone; however, as we 
mention in Sect. 3, PIR operations should be handled asynchronously in a sep- 
arate thread anyway. Using two cores on these HSDir machines asynchronously, 
even these peak loads can still be handled smoothly for the ZeroTrace variants. 
However, XPIR and SealPIR would require more than 7 and 96 cores respectively 
to handle such peak loads, making it prohibitive for deployment. 


5.3 Tor Integration Results 


Finally, we also implemented and evaluated the impact on end-to-end latencies 
induced in Tor to connect to a hidden service when using PIR. For evaluating 


8 Above +1000 descriptors, the SealPIR size jumps to about 1.9 MiB due to recursion, 
as seen in Fig. 2c. Recursion in CPIR schemes increases the response size by a factor 
of f at each level, where f is the ciphertert expansion factor. Since the SealPIR 
compression technique reduces the effective plaintext modulus, it increases f from 
the expected ~ 7 to 10. The expected f ~ 7 arises from the fact that the underlying 
FV scheme can use a plaintext modulus t = 23 for 80-bit security with a coefficient 
modulus q of 60 bits, but since SealPIR uses an effective plaintext modulus t = 12 
and q = 60, and ciphertexts contain two polynomials, the total ciphertext expansion 
f is 10. 

These spikes were for v3 descriptors, and were presumably due to the HSDir holding 
an extremely popular descriptor, as we did not observe a corresponding spike in the 
number of v3 descriptors held. 


o 
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our proposal on the live Tor network, we ran a Tor relay that would serve as an 
HSDir, instrumented with modifications to support asynchronous PIR querying. 
Specifically, in addition to handling incoming HS descriptor stores as a nor- 
mal Tor relay would, it also inserted the incoming HS descriptor into the PIR 
scheme’s store. Full details of this are included in the extended version [12, App. 
A]. Similarly, we instrumented a Tor client to make PIR requests to this relay, 
and finally we created yet another Tor process that was modified to upload hid- 
den service descriptors only to our HSDir with PIR support. In our experiments, 
we used this hidden service generator to generate several hidden services to a 
local web server, and then timed the curl requests for our client to perform a 
HS descriptor lookup and establish a connection with these hidden services. For 
privacy reasons, we only queried for the descriptors we ourselves uploaded. 

With the above described setup, we evaluate three different clients; a stan- 
dard Tor client, a Tor client that uses ZeroTrace’s linear scan PIR (with the 
underlying data stored outside the PRM), and an optimized client that does 
not perform an additional round trip for the parameter fetch, assuming that the 
parameters were already available to the client as we describe in Sect.5.1. We 
use the linear scan variant of ZeroTrace, since we know this to be the appropriate 
choice with the current scale of hidden services from our microbenchmarks in 
Fig. 2a. We note that end-to-end latencies are subject to a lot of variance due 
to variability of several factors such as choice of relays for constructing circuits, 
and unpredictable network conditions encountered by different requests. Hence 
we present our findings in the form of a boxplot in Fig. 3. 

For each of the datapoints in Fig. 3, we collected the timing reported for 
100 curl requests to a hidden service that was not in the client’s cache. The 
impact of our proposed optimization for compressing the additional round trip 
is immediately evident from this figure, as the medians for these two conditions 
seem to differ by almost an entire second across a majority of the datapoints 
we collected. Furthermore, we notice that deploying PIR in practice does not 
drastically impact the end-to-end latencies experienced by a user connecting to 
a hidden service, as the PIR overheads are completely hidden within the noise 
of overheads of using the Tor network. 


6 Conclusion 


HSDirs serve a unique purpose in the Tor network, acting as a DNS server for 
. onion addresses. For this reason it is important to make sure we can completely 
characterize the information HSDirs are privy to in their roles. We have shown 
that HSDirs have access to a relatively high amount of information in the Tor 
network. We find that the property of unlinkability, which is intended to guar- 
antee that onion services cannot be tracked by HSDirs over time, is not provided 
to all onion services due to the HSDir’s ability to count the number of queries 
made for each service in an epoch. 
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Table 1. Summarizing the results of all schemes that were considered 


Scheme Sec Guarantee Required Changes BW Overhead Availability 
Current None None None Unchanged 
CPIR [1,2] LWE Minimal Large Unchanged 
ZeroTrace [27] Hardware Minimal Minimal Unchanged 
ITPIR [6,15] Probabilistic Major Changes xe Increased with v 


To prevent this information leakage in the future, we investigate the inte- 
gration of PIR into Tor. This integration is a complex problem due to the large 
design space and many PIR options, each with their own requirements, guaran- 
tees, and drawbacks. In this work we have thoroughly explored these options, 
explaining their strengths and weaknesses, and show how to integrate them into 
Tor. We conclude by discussing the options we have investigated and their suit- 
ability for our purpose. Our results are summarized in Table 1. 

XPIR and SealPIR are attractive options because of their well-understood 
security assumptions (LWE) and the minimal changes needed to the structure 
of the hash ring. Clients would still only query a single HSDir in the hash ring, 
and the availability of the descriptors would be unaffected. However, the heavy 
computational and bandwidth burden incurred by XPIR is too high. SealPIR 
on the other hand alleviates the bandwidth overhead due its small query size, 
but worsens the computational overhead. 

In contrast, Multi-server PIR schemes are very efficient, and the extra band- 
width used mainly comes from the fact that £ queries are made, instead of one. 
However, their security guarantees are probabilistic, and in each epoch, there 
is a possibility that enough adversarial HSDirs will be placed into an -block 
to compromise the privacy or availability of a query. Furthermore, multi-server 
PIR schemes require major changes to how descriptors are stored and queried. 

Hardware-based PIR schemes are attractive in some senses, but challeng- 
ing in others. Like XPIR and SealPIR, they require only minimal changes to 
the structure of the hash ring and process by which descriptors are queried. 
Availability is unchanged compared to the current state of the network. As well, 
these schemes perform very well, and add minimal bandwidth costs. The draw- 
back with these schemes is that they depend on the security and availability 
of the hardware used. The security of trusted execution environments like Intel 
SGX is still being actively explored and improved. 

Any use of PIR for retrieving descriptors improves privacy over the current 
state of Tor. Currently all queries (made by querying a blinded public key) are 
readable by an HSDir, allowing them to selectively deny queries and correlate 
incoming queries with the descriptors they hold to gather conclusions that erode 
the privacy of both clients and onion services. PIR can thus provide a significant 
improvement to the privacy of Tor onion services. 
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Abstract. In this paper we investigate the field of privacy-preserving 
authenticated key exchange protocols (PPAKE). First we make a cryp- 
tographic analysis of a previous PPAKE protocol. We show that most of 
its security properties, including privacy, are broken, despite the security 
proofs that are provided. Then we describe a strong security model which 
captures the security properties of a PPAKE: entity authentication, key 
indistinguishability, forward secrecy, and privacy. Finally, we present a 
PPAKE protocol in the symmetric-key setting which is suitable for con- 
strained devices. We formally prove the security of this protocol in our 
model. 


Keywords: Authenticated key agreement - Internet of Things - 
Cryptanalysis - Privacy - PPAKE - Security model 


1 Introduction 


Entity authentication and indistinguishability for the session key are the pri- 
mary goals that a key exchange protocol aims at achieving. With the growth 
of social networks, and virtual communications, privacy-preserving techniques 
have gained interest in the design of real-world protocols (e.g., TLS 1.3 [32]). 
With the development of the Internet of Things (IoT) and its novel use cases 
interest in privacy is revived. 

IoT provides applications in many fields: patient remote monitoring, energy 
consumption, air pollution control, traffic management, retail and logistics, etc. 
IoT technologies deal with and combine sets of data which makes increasingly 
difficult to distinguish between information that enable identification and infor- 
mation which do not [38]. For instance, smartphones gather critical amount of 
private data about their owner (identifiers, location, activity) that bear privacy 
risks. The diversity of connected objects form a large intelligent network that 
can serve as a medium for the leakage of personal data [29]. Rather soon the 
threats induced by the distributed nature of the IoT have been highlighted [23], 
among which one can cite identification, tracking, and profiling. 

Devising a security protocol for the IoT is a challenging task since the devices 
that must implement and execute the protocol are constrained in terms of energy, 
© Springer Nature Switzerland AG 2022 
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computation, and memory in particular. Consequently, the protocols are often 
built on symmetric-key functions for efficiency reasons. In turn, these “sym- 
metric” protocols do not achieve the same security properties as “asymmetric” 
protocols (i.e., based on public-key schemes). Adding yet another security prop- 
erty (privacy) is not a trivial task. 

In this paper we focus our attention on the SAKE protocol proposed by 
Avoine, Canard, and Ferreira [5]. Built solely upon symmetric-key functions, 
SAKE is an efficient protocol for constrained devices. It provides mutual authen- 
tication, key exchange, and forward secrecy. Its security is proved in a strong 
model (roughly the same type of model as those used to analyse protocols based 
on asymmetric algorithms). Moreover, with a suitable choice of symmetric func- 
tions, SAKE is quantum-secure. This raised the attention of the French National 
Cybersecurity Agency (ANSSI) which indicates that SAKE is a possible alter- 
native to current “classical” authenticated key exchange (AKE) protocols in a 
quantum world [2]. Our goal is to turn SAKE into a privacy-preserving protocol, 
while keeping all its security properties, and to formally prove the security of 
the resulting protocol in a model at least as strong as that of used to analyse 
the original protocol. 


1.1 Related Work 


Most of the privacy-preserving protocols for low-resource devices, and the 
corresponding adversarial models are related to the RFID field (e.g., [7,16— 
18, 25, 27,28,36] to cite a few). Privacy-preserving mechanisms have also been 
investigated in other IoT contexts such as smart homes [31,35] or low-power 
wide area networks (LPWAN) [4,37]. However most of these works consider the 
privacy property only, focus on a specific setting (LPWAN), require a specific 
hardware (physically unclonable functions), or are built on questionable tech- 
niques with respect to security and efficiency (chaotic maps). 

In [1], Aghili, Jolfaei, and Abidin propose a privacy-preserving authenticated 
key exchange protocol (PPAKE) with forward secrecy dedicated to IoT. This 
protocol builds upon Avoine et al.’s proposal [5]. Aghili et al. propose a variant 
of this protocol that aims in particular at guaranteeing privacy. However, and 
despite the security proofs they provide, their proposal is flawed (see Sect. 3). 

Restarting from Avoine et al.’s protocol, we fix the issues of Aghili et al.’s 
protocol, and devise a clean and proper security model that we use to formally 
prove the security of the corrected PPAKE protocol. 


1.2 Contributions 


In this paper we investigate the field of privacy-preserving authenticated proto- 
cols. First we make a security analysis of a previous PPAKE protocol [1]. Then 
we describe a new security model which captures privacy (among other secu- 
rity properties) for authenticated key exchange protocols. Finally, we present a 
PPAKE protocol secure in our model. 
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Cryptanalysis. In [1], Aghili et al. propose a PPAKE protocol dedicated to IoT. 
Built upon a previous work by Avoine et al. [5], their proposal aims at keeping 
the same security properties as Avoine et al.’s protocol: entity authentication, 
key indistinguishability, and forward secrecy, and at being resistant to several 
attacks: “replay attacks”, “time-based attack”, and “tracking” (cf. [1, Sects. 6.1 
and 9]). We make a cryptographic analysis of Aghili et al.’s protocol and show 
that most of the claimed security properties are broken (we respect the same 
attack settings that are considered in [1], in particular the powers granted to the 
adversary). 


Security Model. We present a security model that captures strong guarantees 
for authenticated key exchange protocols. We extend the security model used by 
Avoine et al. [5] to prove the security of their protocol by introducing a crite- 
rion for indistinguishability of identities. That is, in order to define the privacy 
property, we borrow the concept of virtual identifier from Hermans, Pashalidis, 
Vercauteren, and Preneel [24], which appears also in Ouafi and Phan [30]. This 
concept allows hiding the identity of the party the adversary interacts with. The 
privacy property guarantees not only that the identity of an end-device is hidden, 
but that two different protocol runs are unlinkable. We also follow the paradigm 
proposed by Schwenk, Schage, and Lauer [33], and incorporate the privacy prop- 
erty together with the other security properties. This approach guarantees that 
the different security properties are independent of each other. More specifically, 
our resulting model requires that, say, the key indistinguishability property holds 
even in the presence of attacks that adaptively unmask identities. Conversely, 
confidentiality of identities is ensured even in the presence of queries that let the 
attacker reveal session keys. This yields a strong security model which can serve 
as a tool to analyse other authenticated key exchange protocols that implement 
mechanisms to guarantee privacy. 


Privacy-Preserving AKE. Starting anew from the SAKE protocol proposed by 
Avoine et al., we take another look at the concept of PPAKE for constrained 
devices. To the security properties guaranteed by SAKE, we add privacy. This 
results in a PPAKE protocol suitable for constrained devices that we naturally 
call Privacy-Preserving SAKE (PPSAKE). We formally prove that PPSAKE is 
secure in our strong security model. 


2 Description of the SAKE Protocol 


2.1 SAKE 


SAKE [5] is a two-party AKE based on symmetric-key functions and pre-shared 
keys (see Fig. 2). The two parties A and B share a derivation master key K and 
an authentication master key K’. In order to mutually authenticate, each party 
exchanges a pseudo-random value (r4, rg). A MAC tag is computed over this 
challenge and returned to the sender (messages mp and ma). The session key 
is computed from the two pseudo-random values r4 and rg and the derivation 
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master key: sk — KDF(K,14,1rg). Forward secrecy is guaranteed by using a key 
evolving scheme. That is, once both parties are mutually authenticated and the 
session key is computed, the derivation master key is updated with a one-way 
function: K — update(K). Therefore the previous session keys remain safe even 
if (updated) K is disclosed. 

As soon as two parties make a shared (symmetric) key evolve, a synchroni- 
sation problem arises: one of the parties has to make the first move whereas the 
other remains late, at least temporarily. This issue is solved with the authen- 
tication master key. The initiator A in SAKE stores the authentication keys 
corresponding to three consecutive epochs: previous (Kj_,), current (Kj), and 
future (Kj) (see Fig. 1). Upon reception of the MAC-ed challenge computed 
by B (message mg), A detects which epoch B belongs to by checking its MAC 
tag. Then, in the subsequent message (m4), A indicates B if it must catch up 
(with the bit €). Likewise, if A is late, it updates its master keys and then pro- 
ceeds with the regular operations (upon reception of message Tp). Eventually, 
both parties update the authentication master keys the same way they do for 
the derivation master key. Only the initiator needs to keep the authentication 
master keys of three consecutive epochs. Avoine et al. have proved that the ini- 
tiator A can only be either one step behind, or in sync, or one step ahead to 
B (hence the figure of three keys Kj_,, Kj, Kj,1). That is dap € {—1,0,1} 
where ap is the gap between A and B. Since the derivation master key and 
the authentication master key are independent, keeping previous authentication 
master keys does not jeopardise forward secrecy. 

Once a correct and complete session ends, three goals are achieved in the same 
protocol run: (i) the two parties have updated their master keys, (ii) their master 
keys are synchronised, and (iii) they share a new session key. Therefore mutual 
authentication, key exchange (with forward secrecy), and resynchronisation are 
done in the continuity of a single session. Moreover, there is no need for an 
additional procedure (e.g., resynchronisation phase) or functionality (e.g., shared 
clock). The protocol is made of five messages at most, and can be reduced to 
four messages if the two parties are synchronised at the beginning of the session. 
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Fig. 1. Party A stores authentication master keys corresponding to three consecutive 
epochs (j — 1, j, 7 + 1), and one derivation master key (illustration with j = 2 with 
the blue dashed box). Party B stores one sample of each master key (boxed in blue). 
(Color figure online) 
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Fig. 2. The SAKE/SAKE-AM protocol. Elements surrounded by a blue dashed box 
appear only in SAKE. Elements boxed in blue appear only in SAKE-AM. (Color figure 
online) 


Notations. For the sake of clarity, we use the following notation in Fig. 2: 


— kdf corresponds to: sk — KDF(K,ra4,rpB) 
— upd, corresponds to 

1. K + update( K) 

2. K'a kK! 


298 L. Ferreira 


3. Ky — Ki; 
4. Ki, — update( K; 1) 
— updpg corresponds to: 1. K — update( K), 2. K’ — update( K’). 


Moreover, Vrf (k, m, T) denotes the MAC verification function that takes as input 
a secret key k, a message m, and a tag T. It outputs true if 7 is a valid tag on 
message m with respect to k. Otherwise, it returns false. 


2.2 SAKE-AM 


From SAKE a complementary mode can be derived: SAKE-AM (which stands for 
“agressive mode”). Compared to SAKE, the first message (ida||ra) is skipped. 
Hence, in SAKE-AM, B is the initiator (and stores two master keys K, K’). 
What becomes the first message is computed as mg = idg||rp\||re with Ts = 
Mac(K’, idg|lida||rg). The second message is computed as ma = €||r.4||74 (with 
Ta computed as in SAKE). The other messages and calculations are essentially 
the same as in SAKE. 

Used together, SAKE and SAKE-AM allow any party to be either initiator 
or responder in a protocol run. Moreover the smallest amount of calculation 
is always done by the same party (irrespective of its role). This is particularly 
convenient in the context of a set of end-devices communicating with a back-end 
server. When the end-device wants to initiate a communication, protocol SAKE- 
AM is launched. Otherwise (the server is initiator), SAKE is used. Therefore, 
the end-device always does the lightest computations. 


3 A Flawed Proposal 


In [1] Aghili et al. propose to modify SAKE/SAKE-AM in order to turn the 
protocol into a privacy-preserving scheme. They consider a setting where party 
A is a server communicating with a set of end-devices (many parties B). They 
modify the SAKE and SAKE-AM protocols in order to achieve three main goals: 


1. Forbidding identification and tracking of a party B (in particular with idp). 

2. Forbidding the replay of the first message (mg) in SAKE-AM. In SAKE-AM, 
a message mp corresponding to the previous epoch (i.e., computed with the 
authentication master key K/_,) can be replayed multiple times to A (until 
a new session is completed), and A computes and responds with a message 
ma. Although this is not sufficient for the responder A to “accept” and to 
authenticate the initiator B (eventually the session aborts), Aghili et al. aim 
at preventing such a possibility. In contrast, in TLS 1.3 with 0-RTT mode 
[32] the server must deem the initial message (Client Hello) as authentic, 
and execute the request herein included [21]. Consequently, mitigations are 
necessary in TLS 1.3 (cf. [32, Sect. 8]). 

3. Forbidding recognition of a party B based on the amount of calculations done 
by A. In some cases (see below), when party A receives a message mg in 
Aghili et al.’s version of SAKE and SAKE-AM, A must try all authentication 
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master keys it stores in its database in order to verify the message, until 
a match is found. Therefore the time spent by A to find the correct key 
allows an adversary to recognise which party B communicates with A (the 
measurement done by the adversary is used as an index that designates B). 


3.1 Issues 


Aghili et al. modify the SAKE/SAKE-AM protocol as follows. First they add 
identifiers in the messages in order for two communicating parties A and B to dis- 
tinguish which messages are intended to them, among the flow of messages sent 
by all parties. That is, they necessarily mix the communication and the applica- 
tion (cryptographic) layers since the former may include parameters (identifier) 
that contradict the goals they want to achieve.! In addition, upon reception of 
ma, idg is updated by B (A does also the same): idg — update(idg||K’). This 
new identifier value is transmitted in the subsequent message sent by B and also 
in the first message of the next protocol run. We can see that there is a first 
issue since the same identifier idp is used in two consecutive sessions. Therefore 
it is trivial to track party B (this contradicts goal 1.). 

Moreover, idg is replaced with a pseudo-random value ra in mg if message 
ma was not received by B during the previous session. The purpose is to avoid 
that the same identifier idg be used in two consecutive messages mp (a correct 
message ma triggers the update of idg, hence idg remains the same in the 
absence of such a message). In Aghili et al.’s version of SAKE-AM, this means 
that mg = 2||rp\\7B with x € {idg,ra} and Tg = Mac(K',idg||ida||r B). When 
£ = fa, party A tries all the authentication master keys K’ (corresponding to 
different communicating parties B) it stores in its database until a match is 
found. The issue here is that ra is not included in the computation of Tg even 
when it replaces idg in mg. Therefore an adversary can alter mpg without A 
being able to notice the change. This breaks entity authentication because, in 
the adopted security model, partnership is based on the notion of “matching 
conversations” (i.e., equality of transcript of messages) [26]. 

Furthermore this invalidates goal 3.. Indeed when the adversary modifies idg 
in mg this compels A to try all the authentication master keys, which helps the 
adversary to recognise which party B has sent the message mp, hence to track 
that party (defeating goal 1. again). 

In order to achieve goal 2. (which concerns SAKE-AM only), A stores the 
pseudo-random value rg received in the two previous sessions. This countermea- 
sure is not enough and can easily be bypassed. The adversary merely intercepts 
three times consecutively an initial message mg sent by the initiator B, and 
not received by A (dropped by the adversary). Next the adversary lets A and 
B complete successfully one protocol run. The three messages mg highly likely 
carry pairwise distinct values rg. When the adversary sends any of these mes- 
sages, they are accepted by A because they carry an unknown value rg, and 


1 In [5], Avoine et al. describe the message flow of a cryptographic protocol. Conse- 
quently, they indicate only the parameters that are necessary on a cryptographic 
point of view. 
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because they all correspond to the previous epoch from A’s perspective (i.e., 
computed with Kj_,). Therefore A computes and sends a message ma. Alter- 
natively sending these three messages “flushes” A’s memory of rg values. Hence 
A keeps sending messages m4 in response. 

The issues raised above break the security properties claimed in [1, Sects. 6.1 
and 9] (respecting the same security experiments and adversarial model consid- 
ered by Aghili et al.): replay, time-based attack, tracking, entity authentication. 


3.2 Countermeasures 


The vulnerabilities in Aghili et al.’s proposal can be thwarted as follows. In order 
to fix the issue in the entity authentication, the pseudo-random value rą must 
also be involved in the computation of the MAC tag Tg of message mp. 

To thwart the replay attack, A must detect all values rg previously received. 
This can be done efficiently with a Bloom filter [14] or a Cuckoo filter [19]. 

The time-based attack can be mitigated by equalising the time spent to 
explore the set of authentication keys (e.g., all keys are tried even when the 
correct one is found), or by randomly exploring this set [6]. 

To forbid any tracking, a value idg must be used once only per session. 

The vulnerabilities we describe question also the correctness of the security 
proofs provided in [1], made in the computational model (using the game-based 
methodology [11,34]), and with the ProVerif verification tool [13]. In particular, 
the privacy property is not captured by the security model used in [1]. This 
highlights the importance of devising and using a suitable security model. 

The relevance of the countermeasures that we succinctly present due to lack 
of space is shown in the security proofs (see the full version of the paper [20}) 
for our privacy-preserving AKE protocol described in Sect. 5. 


4 Security Model 


In this section, we present our security model for PPAKE protocols. We use the 
model for authenticated key exchange protocols described by Avoine et al. [5] 
to prove the security of their SAKE and SAKE-AM protocols, which is based 
on the model of Brzuska, Jacobsen, and Stebila [15]. This model captures entity 
authentication, key indistinguishability, and forward secrecy in the symmetric- 
key setting. 

We extend this model by introducing a criterion for indistinguishability of 
identities. That is, in order to define the privacy property, we borrow the con- 
cept of virtual identifier from Ouafi et al. [30] and Hermans et al. [24]. This 
concept allows hiding the identity of the party the adversary is interacting with. 
The privacy property guarantees not only that the identity of an end-device is 
hidden, but that two different protocol runs are unlinkable. Given the two-party 
protocol which we want to prove the security, and its deployment context, we 
aim at guaranteeing the end-device’s privacy only. However our model can be 
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extended in a straightforward manner to provide privacy to any party involved 
in a protocol run (end-device and server). 

Finally, we follow the paradigm proposed by Schwenk et al. [33], and incor- 
porate the privacy property together with the other security properties. This 
approach guarantees that the different security properties are independent of 
each other. More specifically, our resulting model requires that, say, the key 
indistinguishability property holds even in the presence of attacks that adap- 
tively unmask identities. Conversely, confidentiality of identities is ensured even 
in the presence of queries that let the attacker reveal session keys. Hence our 
model is stronger than models where security properties are considered sepa- 
rately (e.g., privacy and key indistinguishability), and not all the adversarial 
queries are available in all the security experiments (e.g. [3,22]). 

In our model, the long-term symmetric keys shared by the two communicat- 
ing parties can not be given to the adversary before the session is completed (i.e., 
our security model does not capture key compromise impersonation attacks [14]). 
However these keys can be disclosed once one of the two instances accepts (this 
captures forward secrecy). This makes our model stronger than other models 
used in the symmetric-key setting (e.g., [22]), and comparable in terms of pow- 
ers granted to the adversary to security models used in the asymmetric setting 
(e.g., [15,26]). 

We do think that this security model can serve as a tool to analyse other 
authenticated key exchange protocols that implement mechanisms to guarantee 
privacy. 


4.1 Execution Environment 


Parties. Let E be a set of end-devices, and S a set of servers. The type of a 
party P; is denoted type(P;) € {end-device, server}. 

A two-party protocol is carried out by an end-device and a server. Each 
party P; € E US has an associated long-term key P;.Itk, and is identified with 
two parameters: its permanent identifier which we also denote by P;, and its 
current identifier P;.id. The same long-term key is shared by a unique pair of 
parties (P;, Pj). That is: P;.ltk = P;.Itk. 

In addition, a party P; E€ S stores a database which each entry corresponds 
to the long-term key of an end-device party P;.Itk, its current identifier P;.id, 
along with its permanent identifier P}. 


Instances. Each party can take part in multiple sequential executions of the 
protocol. We prohibit parallel executions of the protocol. Indeed, since the pro- 
tocol we propose is based on shared evolving symmetric keys, running multiple 
instances in parallel may cause some executions to abort.” This is the only 
restriction we demand compared to AKE security models used in the public-key 
setting. 


? This is a technical feature of the SAKE and SAKE-AM protocols, which our 
PPSAKE protocol is based on. In this regard, we refer the reader to [5, Sect. 6]. 
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Each run of the protocol is called a session. To each session of a party P;, an 
instance 7? is associated which embodies this (local) session’s execution of the 
protocol, and has access to the long-term key and current identifier of the party. 
P; is called the parent of 77, and the type of an instance is the type of its parent: 
type(7?) = type(P;) € {end-device, server}. In addition, each instance maintains 
the following state specific to the session. 


— p: the role p € {initiator, responder} of the session in the protocol execution, 
being either the initiator or the responder. 

— pid: the identity pid € P of the intended communication partner of 77. 

— a: the state a € {L, running, accepted, rejected} of the instance. 

— sk: the session key derived by 73. 

— k: the status « € {L, revealed} of the session key 77.sk. 

— sid: the identifier of the session. 

— b: a random bit b € {0,1} sampled at initialisation of 73. 


We use the notion of initiator and responder on the one hand, and end-device, 
server on the other hand. An initiator instance sends the first message of the 
protocol, whereas a responder instance responds to it. An end-device party hides 
its “real” identity, whereas a server party does not. In Sect.5, we present two 
protocols that allow an end-device party to behave either as initiator or responder, 
and conversely a server can be either responder or initiator. Therefore the notion 
of end-device and server is mainly used in the privacy experiment in order to 
indicate which party the adversary’s goal is to find the identity. 

We put the following correctness requirements on the variables a, sk, sid and 
pid. For any two instances 7? Ti, the following must hold: 


4? 


1; .@ = accepted > t; .sk AL An}.sid AL (1) 
m?.sk = 15.sk 
TT; = m4. = accepted A 77 .sid = m7 Sid => ¢ m.pid = Pj (2) 


J 


Virtual identifier. In order to hide to the adversary which end-device party it is 
interacting with, the notion of virtual identifier is used. A virtual identifier vid = 
P;|P; refers to two parties P;, P; € E, which are known to the adversary. The real 
involved party is designated by realvid(vid), depending on a secret bit b € {0, 1}. 
This bit is sampled at initialisation of vid. If vid.b = 0, then realvid(vid) = P;. 
If vid.b = 1, then realvid(vid) = P;. In addition, type(vid) = end-device. 


Adversarial Queries. The adversary A is assumed to control the network, and 
interacts with the instances by issuing the following queries to them. 


— DrawParty(P;, Pj): this query creates a virtual identifier vid = P,|P;, adds 
vid to the list Cyig, and returns vid. If P; ¢ E or P; ¢ E, then it returns L. If 
P; or Pj are used in a virtual identifier already in Lyig, then it returns L. 


Privacy-Preserving Authenticated Key Exchange for Constrained Devices 303 


— NewSession(id, p, id’): this query creates a new instance 7? at party id, having 
role p, and intended partner zd’. If type(id) = type(id’), the query aborts. If 
id is a virtual identifier, then the parent of 7? is realvid(id). If id’ is a virtual 
identifier, then 7?.pid = realvid(id’). 73.a is set to running. If p = initiator, it 
produces the first message of the protocol which is returned to the adversary. 

— Send(7?,m): this query allows the adversary to send any message m to 77. 
If r.a A running, it returns L. Otherwise 7? responds according to the 
protocol specification. 

— Corrupt(P;): if type(P;) = end-device, this query returns the long-term key 
P;.ltk of P;. If type(P;) = server, this query returns all long-term keys P;.Itk, 
P; € E, stored by P;. If Corrupt(P;) is the v-th query issued by the adversary, 
then we say that P; is v-corrupted. For a party that has not been corrupted, 
we define v = +00. Moreover we say that a virtual identifier vid = P;|P; is 
corrupted if either P; or P; is corrupted. 

— Reveal(z?): this query returns the session key m.sk, and mf.k is set to 
revealed. 

— Unmask(7?): this query returns the permanent identifier P; of 7?’s parent. 

— Test(7#): this query may be asked only once throughout the game. If 77.a 4 
accepted, then it returns L. Otherwise it samples an independent key sko 2 
K, and returns sk, with b = 73.b, where sk; = 77.sk. The key sk» is called 
the Test-challenge-key. 

— Free(vid): this query removes vid from Lyig. Moreover, for any instance 7? such 
that either vid is the parent of më or 7#.pid = vid, if t.a € {L,running}, 
then it sets m.a = rejected. 


The adversary is an active Man-in-the-Middle which can interact with par- 
ties, and adaptively issue queries. The adversary is granted the ability to query 
the used identities of arbitrary session partners (with the Unmask query). Our 
goal is to consider a strong adversary, in the sense that we allow as far as possible 
all queries (Corrupt, Reveal, Send, Unmask, etc.) except the queries that allow 
trivial attacks (i.e., attacks that allow the adversary to win, regardless of the 
design of the protocol). 


Definition 1 (Partnership). Two instances n? and x‘ are partners if 7}.sid = 
sid. 
j 


A privacy-preserving authenticated key exchange protocol (PPAKE) is a two- 
party protocol satisfying the correctness requirements 1 and 2, and where the 
security is defined in terms of a PPAKE experiment played between a challenger 
and an adversary. This experiment uses the execution environment described 
above. The adversary can win the PPAKE experiment in one of three ways: (i) 
by making an instance accept maliciously, (ii) by guessing the secret bit of the 
Test-instance, or (iii) by guessing the secret bit of the privacy experiment. 


Definition 2 (Entity Authentication (EA)). An instance n? of a protocol 
IT is said to have accepted maliciously in the PPAKE security experiment with 
intended partner P;, if 
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1. 13.a = accepted and m°.pid = Pj when A issues its vo-th query, 
2. P; and P; are v- and v'-corrupted with vo < v, vo < v', and 


3. there is no unique instance nt such that n? and nt are partners. 
l i) Pp 


J 


The adversary’s advantage is defined as its winning probability: 
advsrt auth — Pr A wins the EA game]. 


Definition 3 (Key Indistinguishability). An adversary A against a pro- 
tocol II, that issues its Test-query to instance t? during the PPAKE security 
experiment, answers the Test-challenge-key correctly if it terminates with output 
b', such that 


1. 13.0 = accepted and r} .pid = Pj when A issues its vo-th query, 

2. 7?.« # revealed and P; is v-corrupted with vo < v, 

3. for any partner instance Ti of m3, we have that TK # revealed and P; is 
v’-corrupted with vo < v', and 

4. Tb = 0’. 


The adversary’s advantage is defined as 


-j 1 
adv ind _ Pr[7?.b = b'] — 5° 

Note that the definition of key indistinguishability incorporates a requirement 
for forward secrecy. 


Definition 4 (Privacy). An adversary A against a protocol IT, wins the pri- 
vacy game during the PPAKE security experiment, if it terminates with output 
(TF, b), such that 


~ type(m;) # type(7;.pid) 
- If type(7?) = server, let vid be the (virtual) identifier of 7? .pid: 
1. n.a = accepted when A issues its vo-th query, 
2. m!.a@ = accepted when A issues its 1,-th query, where 1’ is the instance 
created after n? such that its parent and intended partner are the same as 
those of 73, 
3. the parents of t? and vid are v- and v'-corrupted with vı < v, vo <0’, 
4. A did not issue an Unmask query to Ti for any instance Ti such that Ti 
is partnered with m? and type(7;) = end-device, and 
5. vid.b = b. 
— If type(?) = end-device, let vid be the (virtual) identifier of the parent of 3: 
1. t.a = accepted when A issues its vo-th query, 
2. 77.0 = accepted when A issues its 11-th query, where n} is created after 
T; for any partner Ti of T}, such that T} .pid is 7; ’s parent and t} ’s parent 
is 7; .pid, 
3. vid and 7% .pid are v- and v'-corrupted with vo < v, vı <V’, 
4. A did not issue an Unmask query to 77, and 
5. vid.b = b. 
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The adversary’s advantage is defined as 


i 1 
advi O = |Pr[vid.b = b'] — zl 


Definitions 2, 3, and 4 allow the adversary to corrupt an instance involved 
in the security experiment (after some time, in order to exclude trivial attacks). 
Therefore, protocols secure with respect to Definition 5 below provide forward 
secrecy. We do not allow the targeted instance to be corrupted before it accepts. 
That is, this security model does not capture key-compromise impersonation 
attacks (KCI) [12] since that would allow trivially breaking key exchange proto- 
cols solely based on shared symmetric keys. 


Definition 5 (PPAKE Security). We say that a two-party protocol IT is 
a secure PPAKE protocol if IT satisfies the correctness requirements 1 and 2, 


and for all probabilistic polynomial time adversary A, adv$i"*"", adv nd and 
adv’ are a negligible function of the security parameter. 


4.2 Security Definitions of the Building Blocks 


In our proofs, we rely upon standard security definitions. The security definition 
of a pseudo-random function (PRF) is taken from Bellare, Desai, Jokipii, and 
Rogaway [8], and that of a MAC strongly unforgeable under chosen-message 
attacks from Bellare and Namprempre [9]. We rely also on the definition of 
matching conversations initially proposed by Bellare and Rogaway [10], and 
modified by Jager, Kohlar, Schage, and Schwenk [26]. 


5 Privacy-Preserving SAKE/SAKE-AM 


In this section we present the protocol obtained when applying to [1] the mitiga- 
tions described in Sect. 3. We call these corrected versions respectively Privacy- 
preserving SAKE (PPSAKE) and Privacy-preserving SAKE-AM (PPSAKE- 
AM). 


Description. The protocol PPSAKE is depicted by Fig. 3. It illustrates the 
generic case when party A is a server communicating with a set of end-devices 
(parties B). As in [1], B uses either an ephemeral identity parameter idp 
(which evolves the same way as its master keys K and K’) or a pseudo- 
random value, depending if the previous protocol run has completed success- 
fully (this is tracked with the flag ¢, initialised to 0). idg = idp allows A 
to retrieve the set of parameters corresponding to B in its database db. For 
each party B, A stores the identity parameter of three consecutive epochs, as 
it is done with the authentication master key (each entry in db is of the form: 
K, (idB j, K;), (idB j-1, Kı), (idB j+1, K541)): 
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A B 
(id.4, db, [htable}) (idg, K, K', ¢) 


ra È {0,1} 


if ($ = 0) 
idg — idg 
@-1 
else if (¢ = 1) 


idp È {0,1} 
rp Š {0,1} 


TB — Mac(K', idg llida lirs IFZ) 
mp — idp||re||tB 


mB 
p i S 


if (idg € db.id) 

entry — get corresponding entry 

if (verif-table(htable, mg) = true) 
abort 

if (verif-entry(entry, mpg) = false) 
abort 


else 

entry + find-entry(m B ) 

if (entry = Ø) 
abort 

if (verif-table(htable, mg) = true) 
abort 


insert-table(htable, m g ) 
Ta — Mac(K', ellidallidg|iralire) 
ma — e|jids |FallTa 


MA 
a E 


if (Vrf(K’, eljidallidglirallrB, Ta) = false) 
abort 

if (e = 1) 
updp 


kdf; upd p 

$= 0 

Tg Mac(K’, idp |llidallirellra) 
m', — idg||Th 


m'p 
g Bon 
if (e = 0) 
Klee K; 
if (Vrf(K', idp||lidal|lre|lra, Th) = false) 
abort 
else if (e = 1) 
K'e Kij | 
if (Vrf (K', idg ||lidalllrB||rA, Tg) = false) 
abort 
kdf; upd 4 


The Mac(K’, idp|lralirs) 
m'y — idp||ty 
m'a 
—— 
if (Vrf(K’, idp||rallrs, T4) = false) 
abort 


Fig. 3. The PPSAKE/PPSAKE-AM protocol. Elements surrounded by a blue dashed 
box appear only in PPSAKE. Elements boxed in blue appear only in PPSAKE-AM. 
(Color figure online) 
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When B transmits a pseudo-random value idg (instead of idg), A must 
explore (in constant time) the database (function find-entry) in order to find the 
matching authentication master key (i.e., which allows verifying the MAC tag 
Tp). This happens only if the previous session was not successful. 

The same value idp is used ina given session, and a new value is used in the 
next session (see Fig. 4). The identity parameter idg appears in all messages (but 
the first one in PPSAKE). The purpose is to allow A to recognise which keys to 
use (when idg is output by the update function), but also to allow B detecting 
which messages sent by A are intended to it. Likewise, with idg (equal to idg or 
pseudo-random), A can correlate the messages mg, m’, and the corresponding 
parameters in database. The identity parameter id, used by A is explicit, and 
never changed as the privacy property aims at protecting B. 


; update -----------==--------- =-=- 
idp.o 3 1aB,1 >lidB.2 > idB,3 >: 
l | 
update | 
Kô > Ki >» KS > K4 4 > 
| EEE Re Re a E N ae Oe A Ree l 
1 I 
update a: 
Ko > Kı a| Kolt > K3 piwe 
E | | | 
J 
7 
sko sky sko sk3 


Fig. 4. Party A stores authentication master keys and identifiers corresponding to three 
consecutive epochs (j — 1, j, j + 1), and one derivation master key (illustration with 
j = 2 with the blue dashed box). Party B stores one sample of each master key, and 
identifier (boxed in blue). (Color figure online) 


In PPSAKE-AM, party A stores efficiently (e.g., using a Bloom or a Cuckoo 
filter) the messages mpg it receives (this is not necessary in PPSAKE). Upon 
reception of a message mp, A verifies if it has already been received (these oper- 
ations are done respectively with functions insert-table and verif-table). This pre- 
vents A from responding to a replayed message mg. We use a global table. Note 
that using one table per entry in the database db (i.e., one table per end-device 
party) may lead to privacy breaches depending on how the privacy experiment 
is defined (in particular if the table can be revealed). 

We observe that, depending on the communication layer, it may not be nec- 
essary to include idg in all messages. For instance, if the messages are sent 
through a radio link, A and B can negotiate a specific radio frequency they will 
use during the session. Likewise, if the messages are sent through a wired link, 
some ephemeral per-session identifier (e.g., IP address or any equivalent) can be 
used by A and B to discriminate the messages intended to them. Nonetheless, 
in order to make the protocol agnostic with respect to the communication layer, 
we opt for an ephemeral identifier idg in the messages. 
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As in [5], using PPSAKE-AM together with PPSAKE (both protocols are 
based on the same inner functions) allows any party to be either initiator or 
responder of a session, such that the smallest amount of calculation is always 
done by the same party (i.e., the low-resource end-device). 


if (Vrf (K;, idg lida |r 5 ||iT5;, TB) = true) 
OAB = 0 
Kile K; 
kdf 
upd 4 
e0 
return true 
else if (Vrf( 4-1, idp ||ida||rB |T, TB) = true) 


6aB 1 
Klee j-1 
el 


return true 7 
else if (Vrf(Kj41, idg|lida||ra||iray, TB) = true) 


return true 
else 
return false 


Fig. 5. Pseudo-code of function verif-entry. Elements surrounded by a blue dashed box 
appear only in PPSAKE (not in PPSAKE-AM). (Color figure online) 


Notations. The notations upd,, and upd, are defined as follows: 


— upd, corresponds to 
. K — update( K) 
idg j—1 G aaa idg j 
idB j — idB j+1 
idB j+1 — update( A541, idg j+1) 
: K; — K; 
Ki — K! 
} TE , 
- Kj41 — update(K;41) 
— upd, corresponds to 
1. K — update(K) 
2. idg — update(K’, idp) 
3. K’ — update( K’) 


Doe wN H 


N 
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The function verif-entry (see Fig. 5) takes as input an entry entry € db, and 
a message mp (we assume that the other values used in verif-entry are “global” 
parameters). It outputs true if entry allows verifying correctly mp. 

The function find-entry (see Fig. 6) takes as input a message mp = idg||r all 
Tp, and outputs either an entry entry € db or @. 

The function verif-table takes as input a hash table htable, and a value x, and 
outputs true if x is present in htable. Otherwise it returns false. 

The function insert-table takes as input a hash table htable, and a value zx, 
and inserts x into htable. 


foreach entry € db 
if (verif-entry(entry, mg) = true) 
return entry 
return (J 


Fig. 6. Pseudo-code of function find-entry 


6 Security of Privacy-Preserving SAKE/SAKE-AM 


In this section we prove that PPSAKE and PPSAKE-AM are secure PPAKE 
protocols according to Definition 5. We refer the reader to [5] regarding the 
soundness of PPSAKE and PPSAKE-AM (the proof is the same as that of 
SAKE and SAKE-AM in this regard). 

In order to prove that the protocol PPSAKE/PPSAKE-AM is a secure 
PPAKE protocol, we use the execution environment described in Sect. 4.1. We 
define the partnering between two instances with the notion of matching conver- 
sations. That is, we define sid to be the transcript, in chronological order, of all 
the (valid) messages sent and received by an instance during the key exchange, 
but, possibly, the last one. Furthermore, we choose the function update to be a 
keyed PRF, that is update : (k, x) + PRF(k, x), and we define: 


— update(k) = PRF(k,c) if k is a derivation (K) or authentication (A’) master 
key, and c is some (constant) value 
— update(K’,idg) = PRF(K', idp) 


Theorem 1. The protocol II € {PPSAKE, PPSAKE-AM} is a secure PPAKE 
protocol, and for any probabilistic polynomial time adversary A in the PPAKE 
security experiment against IT 

update 


advent auth < ng (Cna ~1)2-> + (nelq — 1) + 2Jadv”f + (ne + Ladvgiice™* ) 


key-ind f f t-auth 
adv < nq (u adv aces adve) + adver 


i f f -K t-auth 
adv” < ng (a - adVipdate + 2adVMac + 2 r) + adver 
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where ng is the number of end-device parties, ns the number of server parties, 
n=netns, q the maximum number of instances (sessions) per party, k the size 
of the derivation master key K, A the size of the pseudo-random values (r4, rg), 
and advo ate: cel Vana adv p, and adver | are the advantage of an adversary 
to break respectively the PRF-security of update, the SUF-CMA-security of Mac, 


the PRF-security of KDF, and the PRF-security of Mac. 


The extended proof of Theorem 1 is presented in the full version of the paper 
[20]. 


7 Conclusion 


In this paper we have investigated the field of privacy-preserving authenticated 
key exchange protocols (PPAKE). 

First we have made a cryptographic analysis of a previous PPAKE protocol 
intended to the IoT [1], and shown that most of its security properties, includ- 
ing privacy, are broken. Furthermore we have described countermeasures that 
allow preventing these vulnerabilities. The attacks that we exhibit question the 
correctness of the security proofs provided in [1], and highlight the importance 
of using a suitable security model. 

Secondly, we have presented a security model which captures the security 
properties of a PPAKE protocol: entity authentication, key indistinguishability, 
forward secrecy, and privacy. The approach that we take guarantees that the 
different security properties are independent of each other, which yields a strong 
security model. We do think that this security model can serve as a tool to 
analyse other authenticated key exchange protocols that implement mechanisms 
to guarantee privacy. 

Finally, we have described a PPAKE protocol in the symmetric-key setting 
which is suitable for constrained devices. We have formally proved the security 
of this protocol in our strong model. 
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Abstract. This paper studies quantitative relationships between pri- 
vacy, verifiability, accountability, and coercion-resistance of voting pro- 
tocols. We adapt existing definitions to make them better comparable 
with each other and determine which bounds a certain requirement on 
one property poses on some other property. It turns out that, in terms 
of proposed definitions, verifiability and accountability do not neces- 
sarily put constraints on privacy and coercion-resistance. However, the 
relations between these notions become more interesting in the context 
of particular attacks. Depending on the assumptions and the attacker’s 
goal, voter coercion may benefit from a too weak as well as too strong 
verifiability. 


Keywords: Security and privacy metrics - Privacy - Anonymity - 
Verifiability - Voting 


1 Introduction 


Voting is a complex process subject to a number of requirements such as eli- 
gibility, generality, uniformity, freedom of choice, tally integrity, accessibility, 
etc. [4,10,19,24]. In order to implement these requirements, a number of mea- 
sures can be applied. For example, in order to express one’s preference freely 
and withstand coercion, voting privately is often required. Tally integrity, on 
the other hand, can be achieved via various verification procedures. 

Even though both the privacy and verifiability of voting are well-motivated, 
they are at least partially contradictory. Intuitively, when targeting full pub- 
lic verifiability without any trust assumptions, it seems necessary to also open 
all the personalised votes, but this causes privacy loss and potential coercion 
issues. Of course, this intuition is very informal and the situation becomes more 
complicated when we consider particular definitions for privacy and verifiability. 

In order to study the connections between the two notions, the corresponding 
definitions must be given in comparable terms. However, it is far from being clear 
which terms are the best suited for this comparison. Working towards definitions 
that can be quantitatively compared to each other, and coming up with some 
comparison results, are the main aims of the current paper. 


© Springer Nature Switzerland AG 2022 
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2 Related Work 


There are many definitions of privacy in the context of voting, and an extensive 
survey discussing their advantages and drawbacks can be found in [3]. Relations 
between privacy and coercion-resistance for certain formal definitions of these 
notions have been shown in [9]. In this work, we are using definitions of privacy 
and coercion-resistance that originate from [16]. The benefit of these definitions 
is that they allow to measure the corresponding properties quantitatively. We 
instantiate our definitions of verifiability and accountability in the KTV frame- 
work [7,15,17]. This framework provides generic definitions for verifiability and 
accountability, and many other, more specific definitions of verifiability can be 
instantiated in this framework. Among other results, [16] shows the relation 
between privacy and coercion resistance, and [15] shows the relation between 
verifiability and accountability. In all these works, the agents (i.e. the voters and 
the authorities) of a voting protocol are modeled as some processes, typically 
specified in pi-calculus. 

The KTV framework relies on the notion of end-to-end (E2E, global) verifi- 
ability, where voters and external observers are able to check whether the final 
result corresponds to the actual choices of honest voters. An alternative is to con- 
sider universal and individual verifiability as separate properties [23]. Previous 
research has established the following: 


— There can be no unconditional privacy if there is universal verifiability [5]. 
— There can be no privacy if there is no individual verifiability [8]. 


In addition, [5] proves that universal verifiability and receipt-freeness cannot 
be achieved simultaneously unless private channels are available. A receipt is 
a witness which allows verifying in an unambiguous way the vote of a certain 
voter. Intuitively, the existence of a receipt may lead to voter coercion. Different 
types of realistic coercion methods, both legal and illegal, are discussed in [11]. 

It has been noted in [14] that universal and individual verifiability are not 
sufficient for E2E verifiability [12]. Indeed, by definition, universal verifiability 
only checks that the final result corresponds to the submitted votes, but it does 
not require that the votes are well-formed (e.g. that there are no negative votes). 
Also by [14], universal verifiability is not necessary for E2E verifiability. 

In [8], it is shown how manipulation of even one vote may break privacy 
by observing the change that it caused in the tally. It is important that the 
attacker knows whose vote he is trying to change, so privacy requires individual 
verifiability. The proposed attack breaks a particular privacy definition, which 
says that the attacker should not be able to distinguish two protocol transcripts 
where some honest voters Alice and Bob have decided to swap their votes. The 
attacker may drop Alice’s vote in both transcripts, and observe the difference 
in the tally of the two transcripts to determine what Alice’s vote actually was. 
Such a privacy definition is very strong, and in practice, the attacker does not 
actually have access to two alternative voting transcripts. If there are many 
voters, dropping a single vote does not help much in actually guessing some 
other votes. Nevertheless, if the attacker has a strong prior knowledge of the 
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other voters’ choices, such an attack may allow learning the vote of the victim. 
In this work, we consider similar attacks w.r.t. the privacy definition of [16], 
which allows assessing the severity of the attack quantitatively. 

An interesting approach to estimate voting systems in terms of distributional 
differential privacy has been proposed in [18]. While differential privacy is often 
achieved by adding noise to the system, which is unacceptable for voting, DDP 
is achieved by considering the distribution of votes as a source of randomness. 


3 Preliminaries 


3.1 Protocols 


In this section, we present a generic framework for the definitions considered in 
this paper. The framework originates from [7,15,16] and is provided with some 
simplifications, excluding details that are not relevant for this paper. 

First of all, we need the notion of a process that can perform internal com- 
putation and can communicate with other processes by sending messages via 
(external) input/output channels. 


Definition 1 (Process). A process is a set of probabilistic polynomial-time 
interactive Turing machines (also named programs) that are connected via named 
tapes (also called channels). We denote by N(I,O) the set of all processes with 
external input channels I and external output channels O. A process defines a 
family of probabilistic distributions over runs, indexed by the security parameter 
n. The concurrent composition of processes n and x’ is denoted by m||n’. 


A protocol is not a process by itself, but rather a collection of building blocks 
that will be used to define a process. As noted in [16], since the quantitative level 
of privacy, coercion-resistance, and verifiability of a voting protocol depends on 
several parameters such as the number of voters and the number of choices, we 
consider a protocol instantiation for which these parameters are fixed. 


Definition 2 (Protocol instantiation). A protocol instantiation is a tuple 
P = (X, Ch, In, Out, {Ma }ac 5) where 


— X is a set of protocol agents. 

— Ch is a set of protocol channels. 

- In and Out are functions from X to 2 (i.e. assignments of input and output 
channels for each protocol agent) such that In(a) N In(b) = Ø and Out(a) N 
Out(b) = 9 for alla,be X, a £ b. 

- Ma C A(In(a), Out(a)) for a € X is the set of honest programs that can be run 
by the agent a. 


The randomness of agent behaviour, such as the probabilistic distribution of 
choices of an honest voter, is covered by Ma. Particular probability distributions 
are not relevant for the results of this paper. 

A protocol instance is the process that will actually be executed. 
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Definition 3 (Protocol instance, run). Let P = (X,Ch, In, Out, {Mq,,... 
Na, }) for X = {a1,...,an} be a protocol instantiation. 


? 


- An instance of P is a process tp =Ta,||...||Ta, where Va E€ X : ta € Ma, 
- Arun of P is a run of some instance of P. 


Similarly to [7,16], we have not included processes of dishonest parties into 
the definitions of P and zp. Instead, the dishonest parties are subsumed by a 
special adversary process. 


Definition 4 (Adversary). A protocol instance mp is typically run in parallel 
with an adversary process TA as a process T := Tp||Ta. 

There is a bidirectional channel between the adversary A and each protocol 
agent a; E€ X. The adversary can corrupt an agent a; E X by sending a special 
message corrupt. Upon receiving such a message, a; reveals its internal state to A 
and from then on is controlled by A, i.e. runs a dummy process dum which simply 
forwards all messages between A and the interface of a; in np. Some agents (hon- 
est users and incorruptible authorities) ignore corrupt messages. Public informa- 
tion (such as the election result) is output to A even without corruption. 


At the end of a run, na produces some output y. We use the notation a 4 y 
to say that the output of m4 in a run ofr is y. 


We say that an agent a € X is honest in a run of m := mal||rp if a has not 
been corrupted in this run, i.e. has not accepted the message corrupt. We use 
notation mF dis(a) to denote an event (viewing 7 as a probabilistic distribution 
over runs) that the agent a has been corrupted. 

The condition dis(a) can be viewed as a certain property of a protocol P. 
A property is a function that takes as input a run of a process m and returns 
a boolean value, telling whether that property is satisfied. For a fixed protocol 
instantiation P, a property can be viewed as a subset of runs of P. 


Definition 5 (Protocol property). A property y of P defines a subset of the 
set of all runs of P. By ~y we denote the complement of y, i.e. the set of runs 
that do not satisfy y. 


In order to reason about probability distributions of protocol runs taking 
into account the privacy parameter 7, we will need the following definition. 


Definition 6 (negligible, overwhelming, j-bounded [7,15-17]). A function 
f:N-— [0,1] ts negligible if, for every c > 0, there exists no such that f(n) < ce 
for alln > no. The function f is overwhelming if the function 1— f is negligible. 
A function f is 6-bounded if, for every c > 0, there exists no such that f(n) < 
5+ = for all n > no. 


The summary of process-related notation used in this paper is given in Table 1. 
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Table 1. Table of notations. For events, 7 is viewed as a distribution of runs. 


Notation |Type Meaning 


T) process |A process m where all programs use the security parameter 7. 
milir process |Concurrent composition of processes mı and m2. 
T(z) process |A process 7 running with inputs Z. 
TP\y! process | Concurrent composition of all subprocesses of mp, excluding 
subprocesses Ta of agents a € X’ C X. 
To process |Same as Tp\ sy for X’ = {vj,,---, vi, }, where 7C {i,...,|V]}. 


T ++ (a: y)| event The final output of the agent a € X in the run of 7 is y. 
T = y |event The final output of the adversary 7,4 in the run of 7 is y. 


TEY event A run of 7 satisfies a property y. 


dis(a) property | The agent a € X has been corrupted. 


voted(i,c) |property |The voter v; € V cast a vote c. 


F ais set The set of boolean formulae over literals dis(a) for a € X. 


3.2 Notation Related to Voting Protocols 


We will use V to denote the set of voters, C the set of possible choices to select 
from by the voters (a choice does not necessarily represent a single candidate), 
and R the set of possible election results. Let V = VyUVp for VuNVp = 0, where 
Viz are honest voters, and Vp are dishonest voters (controlled by the adversary). 


Let |V| = n = np + na be the total number of voters, where np = |V| and 
na = |Vp|. We assume that the voters are somehow ordered, and the voter with 
index i € {1,...,n} is denoted by v;i. The votes are combined using a result 


function p: C” — R whose exact definition depends on the used voting rule. 


3.3 Verifiability and Accountability 


We start from a generic definition of verifiability from [7]. First of all, we need 
to state what exactly we are verifying. We assume a certain property y (Defini- 
tion 5) that we want to achieve in each protocol run, e.g. that each voter votes at 
most once, or that all ballots are well-formed. If y is achieved, then everything 
is fine. If y is not achieved, then we at least want to detect such a case. 

The definitions of verifiability and accountability used in this paper will be 
based on the particular y for quantitative verifiability proposed in [7]. First, let 
us define the protocol runs covered by y. The idea of the following definition 
is that the final tally (i.e. the multiset of ballots before applying p) of a voting 
protocol may differ from the true tally in at most k votes. 


Definition 7 (k-correctness of the protocol run [7|). A protocol run r, 


where c1,...,Cn, are the choices of honest voters, is called k-correct if there 


/ 


exist valid choices c\,...,¢,, (representing possible choices of dishonest voters) 


and some choices C1,...,Cn, such that: 


— an election result is published in r and it is equal to p(€1,..-,€n); 
= Cig iss Gig Crary Cha) (Či; -+3 Čn)) < k; 
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where the distance d is defined as d(€,€1) = occ |feount (Co) [€] — feoune(€1)[€]|, 
where C is the set of possible choices, and feount : C” —> N? counts how many 
times each choice occurs in a vector. 

The set of all k-correct runs of a protocol is denoted by yx. 


In [7], verifiability w.r.t. a property y is quantified by an upper bound on the 
probability that: 


1. yis not achieved; and 
2. this fact remains undetected by a certain designated party J called the Judge. 


The particular definition of y can be very different, and various choices of 7 
provide different flavours of verifiability. In this paper, we instantiate the generic 
verifiability property of [7] on yx. This leads to the following definition. 


Definition 8 ((k,6)-verifiability). Let tp be an instance of a voting protocol 
P with the set of agents X. Let 6 € [0,1] be the tolerance, J E€ X be the Judge, 
and yk be the set of runs of P such that, for all runs r E€ yz, r is k-correct 
according to Definition 7. We say that mp is (k,6)-verifiable w.r.t. J if for all 
adversaries m4 and T = Tp||m,, the probability 


Pr[(a™ E ~r) A (1 + (J : accept))] 
is 6-bounded as a function of n, and 
Prin 1 (J: reject)] = 0 
if m ¥ dis(a) for alla € X. 


We do not want the attacker to be able to abort the elections, so we need 
to specify what actually happens after the Judge rejects. As proposed in [15], in 
general verifiability is not enough, and in practice, we want accountability. This 
property assumes that, if the Judge rejects, he needs to come up with a certain 
verdict, which states which parties have potentially misbehaved. A verdict is a 
boolean formula over statements dis(a) for a € X. Let Fai; be the set of all 
boolean formulae of such a form. It is possible that a verdict has a form of 
disjunction, e.g. dis(v;) V dis(a), for a voter v; and a voting authority a € X, 
which could mean that it is not clear whether a has dropped the message of the 
voter v;, or the voter v; has not sent a valid message. An accountability constraint 
of a protocol P consists of a property a that we want to be satisfied, and a set 
of possible verdicts ¢),...,¢¢ the Judge J must come out in the case when a is 
not satisfied. 


Definition 9 (Accountability constraint [15]). An accountability constraint 
of a protocol P is a tuple (a, 1, ..., Qe) where a is a property of P (i.e. a subset 
of runs of P) and ¢4,...,¢¢ € Fais. 


In this paper, we will be working with the property a := yz as in Definition 8. 
This means that we require accountability if the tally error is at least k, and we 
agree to accept smaller errors in the tally. 
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Definition 10 ((k,5)-accountability). Let mp be an instance of a voting pro- 
tocol P with the set of agents X, and let J € X be the Judge. Let 6 = 
(Yk, $1,- --, Qe) be an accountability constraint where yp is set of runs of P such 
that, for all runs r € yk, r is k-correct according to Definition 7. 

We say that mp is (k,d)-accountable w.r.t. ® and J if for all adversaries TA 
and T = tp||ma, the probability 


Pr[(x™ E =y) A Iir = (J : ¢:))] 
is 6-bounded as a function of n, and, for alli € {1,...,n}, 
Prr > (J: i) =0 


if mF Qi. 


Ideally, we would like to have individual accountability where every verdict 
blames a particular agent. However, as shown in [15], individual accountabil- 
ity is typically not achieved by voting protocols, and in [2] it was shown that 
resolving a dispute between two agents requires certain assumptions such as 
undeniable channels or trusted authorities. The problem is the communication 
between the voter and the voting system, where a voter may always say that 
“the system does not respond”, and the system may always argue that “the 
voter has not attempted to communicate”. In this work, we will consider general 
accountability. 


3.4 Privacy and Coercion-Resistance 


We take the definition of voter privacy from [16], defined as the inability to 
distinguish whether the voter v € V under observation made the choice c € C 
or c' € C. The parameter k quantifies the number of voters under observation. 


Definition 11 ((k,5)-privacy). Let mp be an instance of a voting protocol P 
with n voters. Let 5 € [0,1] be the tolerance. For alli € {1,...,n}, let wy, be the 
honest process of the voter vi. Let i = {i1,...,¢} C{1,...,n} be the indices of 
honest voters under observation, and let €, E e OF be two assignments of choices 
to the voters i. Denote Tz = TA||M,, (C1) |] --- [Tu (Ce) [lt p\z for an adversary 
process tna. We say that mp is (k,6)-private if the difference of probabilities 
Pr[x) 4 1] — Prr 4 1] 
1,€ 1,€ 

is 6-bounded as a function of the security parameter 7 for all i, Zd and for all 
adversaries TA. 


Differently from Definition 8, here a larger k means stronger privacy guaran- 
tees. The larger k is, the easier it is to distinguish between the two distributions. 

Let us now consider the definition of coercion-resistance from [16]. A protocol 
is called coercion-resistant if the coerced voter, instead of running the dummy 
strategy dum (which simply lets all messages be chosen by the coercer), can run 
some counter-strategy ma such that: 
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1. by running this counter-strategy, the coerced voter achieves their own goal, 
e.g., votes for a specific candidate; and 

2. the coercer is not able to distinguish whether the coerced voter followed 
coercer’s instructions or tried to achieve their own goal (by running rs). 


Similar to the privacy definition, we extend the coercion-resistance of [16] to k 
voters, where we allow that up to k voters can be coerced simultaneously. Here 
the coerced voters may share a common goal y. For example, if the goal of k 
coerced voters is to give at least < k votes to Alice, then it does not matter 
who exactly gave a vote to Alice, and only the total multiset of votes in the 
group matters. 


Definition 12 ((k,6)-coercion-resistance). Let mp be an instance of a voting 
protocol P with n voters. Let 6 € [0,1] be the tolerance. Let i = {i1,... ik} C 
{1,...,n} be the indices of honest voters under observation. Let y be the joint 
goal of the voters i. We say that mp is (k,6)-coercion-resistant w.r.t. y, if the 
exists a joint strategy Ta of coerced voters such that the following conditions are 
satisfied for any adversary 74 connected to vi ,..., Vi, via the interface of dum: 


- Pr[(ralrslia pg) E q] is overwhelming as a function of 7. 


- Pr[(ra|ldum]|a pz) 4 1] —Pr[(ma|lrallt pz) 4 1] is 6-bounded as a func- 
tion of n. 


Note that the counter-strategy does not necessarily belong to the set of honest 
voter processes, and e.g. in order to give k votes to Alice, it is allowed that one 
of the coerced voters submits a malformed ballot with k votes, while the other 
k — 1 coerced voters abstain from voting. 


4 Relations Between Definitions 


In this paper, we study relations between the definitions of Sects. 3.3 and 3.4, 
all of which are quantitative. A summary of relations considered in this paper 
is depicted in Fig. 1. We note that it does not cover all possible relationships 
between definitions, and that each relation holds under certain assumptions. The- 
orem 1 and Theorem 2 are based on [16] and [15], and are slightly adapted to 
match the definitions of Sect. 3.4 which use an additional parameter k. An ana- 
logue of Theorem 1 has also been proven in [9], but it is based on non-quantitative 
definitions. Theorem 3 shows that privacy implies verifiability, and the main dif- 
ference from [8] is again that we are considering quantitative definitions. Theo- 
rem 4 demonstrates incompatibility between verifiability and coercion-resistance. 
A similar theorem of [5] considers unconditional privacy instead of quantitative 
coercion-resistance. Another difference is that [5] considers universal verifiabil- 
ity, while we are considering end-to-end verifiability. Theorem 5 demonstrates 
incompatibility of privacy and individual accountability. We have applied some 
ideas of [2] which lists necessary conditions for fair dispute resolution in voting 
protocols, but does not discuss the relation between accountability and privacy 
directly. In this section, we formally state the corresponding theorems and pro- 
vide proof sketches. The full proofs can be found in [20]. 
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Accountability 


if accountability 
is unsafe 


(Theorem 5) 


w.r.t. the same 


(nn — k,1—6) (k, 5) 


judge party 
(Theorem 2) 


(k, 6) (k, 6) 


verifiability against 
targeted vote 


(k, 6) manipulation (nn — 2k, 6) 
(Theorem 3) 


Privacy Verifiability 


(k, 6) (k-1,1-8) 


coercion of 
misbehaviour 
(Theorem 4) 


if the counter-strategy 

is a valid honest (k, 5) 
strategy 

(Theorem 1) Coercion-resistance 


Fig. 1. Summary of the results of this paper (informal, simplified). Here np is the total 
number of honest voters, and k and 6 are parameters. The graph depicts relations 
between these parameters for different properties of a voting protocol. A unidirectional 
arrow => denotes implication, and a negated bidirectional arrow 4 denotes properties 
that cannot be achieved simultaneously. The arrows can be composed, but one must 
be careful that the assumptions of corresponding theorems are all taken into account. 


4.1 Coercion-Resistance and Privacy 


Relationships of coercion-resistance and privacy have been studied in [9,16]. 
An interesting outcome of [16] is that, while intuitively coercion-resistance is a 
stronger notion than privacy, for some protocols it is possible that the level of 
privacy is lower than the level of coercion resistance. The reason is that the 
counter-strategy of a voter in Definition 12 does not necessarily belong to the 
set of valid strategies of honest voters, and may protect the vote in a better way 
than following the protocol honestly. However, coercion-resistance is nevertheless 
stronger than privacy if we assume that the counter-strategy does not outperform 
an honest strategy, defined as follows. 


Definition 13 (non-outperforming counter-strategy [16]). Let mp be an 
instance of a voting protocol P. Let i = {i1,...,%%} be the indices of hon- 
est voters under observation. Let nr be a process that is only connected to 
the agents v;,,..., Vi, using the interface of dum, and acts on their behalf 
according to an honest strategy m,(C) := Toi, (C1) || --- Il, (Ck). Let to(@) := 
Ty, (C1)||.--||To, (Ck). Let 15(@ be a joint counter-strategy of the honest voters i 
whose goal is to make choices C= {c1,..., ck}. We say that the counter-strategy 
Tä does not outperform the honest voting strategy of wp if, for any adversary 
process TA that is not connected to TX, and any choices € and Ë, 


Pr[(mealln§ l Olr) 6 1 — Prial Olep A 


is negligible as a function in the security parameter n. 
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We adapt a theorem of [16] to our definitions. 


Theorem 1. Let mp be a (k,45)-coercion-resistant instance of a voting protocol 
P. Assume that, for any subset of k coerced voters, the coercion counter-strategy 
ma does not outperform the honest voting strategy of mp (Definition 13). Then, 
Tp is (k,6)-private. 

Proof (Sketch). Suppose that mp is not (k,65)-private. There exist k voters i, 
choices č and Z, and an adversary process 74 such that 


Prix? a 1] — Prime” 4 1] 


is not 6-bounded as a function of 7, where T7 z is defined as in Definition 11. 

Let us now describe a coercer that breaks coercion-resistance. Consider a 
particular setting where the true goals of the voters 2 is to make the choice č. 
Let ro be a coercer that selects for the voters the input @, and otherwise acts 
as an honest voter would. By construction of m$, 


Pr[(rallr4 dumire) Â 1] = Prit? 6 1) 


Let ty = To, ||- -- [Tu - By definition of 17 z, 


Prl(rallm Olr)” 4 1] = Prins 4S. 


Since mg does not outperform my = Te, ||.--||7,;,, and there are no direct con- 


nections between 74 and ae 


A é A 
Pr[(rallr (alla pg) S 1|- Pr[(mal|4 lla Olr ó 1l 


is negligible as a function of 7. We get that 
rI(TA A umM||7 pz > 1| — Pri(Ta ri TH(C)||T pz 4 
P cjd m) 41- P c (m) A1 


is not ĝ-bounded as a function of 7. Let m4: := talt be an adversary that 
outputs the final output of 74. Such ma, breaks (k, ô)-coercion-resistance. Since 
Ta does not interact with ï (as they are honest), and TX, interacts only with i 
using interface of dum, ma satisfies Definition 12. 


4.2 Accountability and Verifiability 


It has been proven in [15] that verifiability can be treated as a special case of 
accountability. We adapt a theorem of [15] to our definitions. 


Theorem 2. Let an instance tp of a voting protocol P be (k,6)-accountable 
w.r.t. a Judge J and a property ® = (Yk, 1,- --, Qe) where Vi: Qi E€ Fais. Then, 
tp is (k,0)-verifiable w.r.t. a Judge J’ who is defined similarly to J, accepting 
those runs where J does not output any verdict ¢;, and rejecting all the other 
runs. 
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Proof (Sketch). Let m := m,|\tp. Suppose that mp is not (k,6)-verifiable w.r.t. 
J. The verifiability may fail due to one of the following reasons: 


1. There is a run where J’ outputs reject, but all parties are honest. Then, 
there is a run where J outputs a verdict ¢; while all parties are honest. This 
violates accountability requirement that Prim”) => (J: ¢;)| =0 if T Z ġi. 

2. Suppose that there exists an adversary process 7,4 such that 


Pr[(a™ E =y) A (r™ = (J' : accept))| 


is not -bounded as a function of 7. 

Let us show that 74 breaks accountability as well. By assumption, J’ outputs 
reject iff J outputs a verdict ¢;. Hence, the event 1) + (J’ : accept) is as 
likely as the event =Ji(7™ +> (J : @;)), hence, 


Pr[( E yp) AFi > (Fs 44) 


is also not d-bounded as a function of n. 


4.3 Privacy and Verifiability 


Without additional assumptions, the verifiability of Definition 8 is neither essen- 
tial for the privacy formalized in Definition 11, nor contradicts it. It is not essen- 
tial since e.g. if the adversary violates the property Yk by directly interacting 
with the final tally, when the ballots are not linked to the identities of voters 
anymore, it will not help in breaking privacy. It does not contradict privacy e.g. 
if the Judge’s verdict only depends on inputs of dishonest parties. 


Considered Attacks. The importance of verifiability for privacy has been 
demonstrated in [8]. The necessity of avoiding duplicate ballots in order to pre- 
serve privacy is mentioned in [3]. While our results and definitions are formally 
different, the considered actual attacks are of similar nature, and are related to 
manipulating the ballots which the attacker can link to identities of particular 
voters. We consider verifiability against particular types of attacks that could be 
applied to violate the goal +. Let us briefly summarize our results. 


— Add ballots: suppose that the attacker is capable of ballot stuffing. 

e If the added ballots do depend on the votes of honest voters (e.g. some 
ballot of an honest voter is replayed), then the attack reduces the privacy 
of voters whose ballots are replayed. 

e If the added ballots do not depend on the votes of honest voters (e.g. are 
chosen by the attacker or are sampled randomly), then the attack does 
not directly help in breaking privacy. 

— Drop ballots: suppose that the attacker is capable of ballot dropping. 

e If the attacker drops ballots of some honest voters, it reduces the privacy 

of the remaining voters who are still counted. 
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e If the attacker drops ballots of some dishonest voters, it does not directly 
help in breaking privacy. 
— Substitute ballots: This attack can be viewed as a combination of ballot adding 
and dropping. The privacy can be reduced in the following two cases: 
e The inserted ballot does depend on the votes of honest voters. 
e The replaced ballot does not depend on the votes of honest voters. 


It is important that the attacker knows whether ballot manipulation has 
succeeded or not. We need the notion of a detectable protocol property. 


Definition 14 (detectable property). Let 7 := 7,||p be a voting protocol 
instance Tp running in parallel with an adversary na. Let y be a property of 7. 
We say that y is detectable in m if 


O 
Pr[(rolln)® 1 | y- Priol) S1 | +7] =1 


for a passive observer process tno who has access to the internal state of m4, but 
does not directly interact with Tp. 


We could quantify the probability in Definition 14 as 6, introducing an extra 
parameter into relations between privacy and verifiability. 


Considered Voting Rules. Many voting systems reveal not just the voting 
result, but also the full tally, which shows the exact number of votes per can- 
didate. Revealing such information can lead to high privacy leakage. For that 
reason, some voting systems like Ordinos [13] ensure that only the final result is 
revealed, e.g. the identity of the winner, and it has been shown in [13] that doing 
this may reduce privacy leakage significantly. In this work, we want to quantify 
attacks on privacy that are possible even if only the final result is revealed. 
The main idea is that, even if we do not know the particular distribution of 
votes and cannot compute privacy parameter 6 precisely, we can apply the attack 
on verifiability to change the number of votes that are “known in advance” to 
the attacker and thus switch between (k, ôx) and (k’, ôx )-privacy. This can be 
useful for certain kinds of voting rules, satisfying the following definition. 


Definition 15 (majority-determined voting rule). Let n be the total num- 
ber of voters. A voting rule is called majority-determined if it is sufficient to 
cast n! = |3] +1 identical votes to determine the election outcome. 


While Definition 15 is trivially satisfied in the case where the election result 
is a counting histogram of votes, it actually holds for a greater variety of widely 
used voting rules. The following descriptions of voting rules are taken from [6]. 


— Plurality rule. Each voter votes for one favorite candidate, and the winner 
is the candidate with the most votes. 

— Borda rule. Each voter orders candidates by preference and each candidate 
j gets m — i points in each vote, where i is the rank of j in the vote, and m is 
the number of candidates; the winner is the candidate with the highest total 
points. 
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More examples of suitable rules satisfying Definition 15 can be found in [20]. 
While these voting rules guarantee success for an attacker who controls a major- 
ity of votes, in practice it is unlikely that all honest voters prefer the same 
candidate, and the attacker may be successful even controlling way less than 
half of the votes. This is closely related to the notion of manipulability of voting. 
The authors of [21] have estimated asymptotic bounds for the fraction of voters 
that are being manipulated to make switching the election outcome hard in the 
average case. It would be interesting to consider such bounds in future research. 

There are some standard voting rules for which Definition 15 does not hold. 
E.g. in a veto rule, each voter gives a score of 0 to one least favorite candidate, 
and 1 to every other candidate, and the winner is the candidate with the most 
votes. Here it is possible that all voters that are not controlled by the attacker 
will veto the particular candidate chosen by the attacker, but the attacker does 
not have enough votes to veto each of the other candidates. 


Results. We now show how privacy implies certain types of targeted attacks on 
votes, i.e. where the attacker is able to link manipulated ballots to the identities 
of corresponding voters who cast these ballots. We will also assume that the 
attacker knows whether the attack has succeeded or not. The main idea is that, 
for majority-determined voting rules, if k > np/2, the attacker can always win 
in the distinguishability game of Definition 11 by taking choices ¢ and d that 
produce different election outcomes. We cannot get a better result without taking 
into account a particular vote distribution, since it is possible that there is a 
candidate whom the remaining n, — k voters will choose with overwhelming 
probability, resulting in a constant election result r that does not say anything 
about the victim’s choice. 


Proposition 1. Let mp be an instance of a voting protocol P that uses a 
majority-determined voting rule, with np honest voters Vx. If np is (k, 6)-private 
w.r.t. a subset of voters Vpr C Vu of size k, then mp is (nn — 2k, 6)-verifiable 
against an attacker ma who has access to Out(J) who is only able to drop votes 
of Vi \ Vpr from the tally, whose success does not depend on the choices of Vy, 
and the property Yn;,—2k is detectable in r,||mp. 


Proof (Sketch). Regardless of the prior distribution of votes, if a protocol uses 
a majority-determined voting rule, if k > n,/2, the attacker may always choose 
votes cy,...,c, and ci,...,c, that determine some election results r # r’. If 
k < na/2, the attacker can use the attack on verifiability to drop some of the 
Np — k ballots of voters that are not under observation, until a majority of ballots 
belongs to voters under observation. Suppose that the attacker has managed to 
drop £ ballots. He will control k out of n — £ ballots. In order to control a 
majority, he needs k > (np, — €)/2, which means £ > np — 2k dropped ballots. If 
dropping £ ballots has failed, the attacker will detect it and output a constant 
bit, which will be the same regardless of the choices of Vpr. Since the protocol 
is by assumption (k,6)-private, the attacker should not be able to drop these £Z 
ballots with probability larger than 6. 
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Proposition 2. Let mp be an instance of a voting protocol P that uses a 
majority-determined voting rule, with np honest voters Vy. If mp is (k, 6)-private 
w.r.t. a subset of voters Vpr C Vy, then P is (np — 2k, 6)-verifiable against an 
attacker m4 who has access to Out(J), who is only able to duplicate votes of 
Vpr in the tally, whose success does not depend on the choices of Vy, and the 
property Yn),—2k is detectable in m,||7P. 


Proof (Sketch). Regardless of the prior distribution of votes, if a protocol uses 
a majority-determined voting rule, if k > n,/2, the attacker may always choose 
votes cy,...,c, and ci,...,c,, that determine some election results r # r’. If 
k < na/2, the attacker can use the attack on verifiability to duplicate some of 
the k ballots of voters under observation, until a majority of ballots belongs to 
voters under observation. Suppose that the attacker has managed to produce 
£ duplicates. He will control k + £ out of np + £ ballots. In order to control a 
majority, he needs k + £ > (na + £)/2, which is ¢ > n, — 2k additional ballots. 
Since the protocol is by assumption (k, 6)-private, the attacker should not be 
able to get these additional @ ballots with probability larger than 6. 


Propositions 1 and 2 put the same constraint on verifiability, which does 
not depend on whether the attacker adds or drops the votes. This leads to the 
following theorem, which is an immediate consequence of the propositions above. 


Theorem 3. Let mp be an instance of a voting protocol P that uses a majority- 
determined voting rule, with np honest voters Vy. If rp is (k,6)-private w.r.t. a 
subset of voters Vpr C Vy, then mp is (np — 2k, 6)-verifiable against an attacker 
ta capable of duplicating votes of Vpr and dropping votes of Vy \ Vpr, assuming 
that success of the attack does not depend on the particular choices of the voters 
Vy, and the property Yn,,—2n is detectable in 7 ,4||7Pp. 


The attacks of Theorem 3 are mostly oriented to small-scale elections with 
few voters. Suppose that the attacker is interested in a vote of a particular 
single voter, i.e. k = 1. Let there be np honest voters for an even np. The 
attacker attempts to drop *}+ ballots belonging to the remaining np — 1 voters, 
and introduces “> copies of the ballot of the vote under observation instead. 
There are still n, votes in the final tally, but 4 + 1 of them are copies of the 
ballot under observation, so the winner of the election is the main preference of 
the victim. It is interesting that when the attacker combines vote adding and 
dropping, in the end, the protocol run may still satisfy yn,—2, if the dropped 
votes occasionally turn out to be the same as the added votes. Such an attack 
is formally treated as unsuccessful, and in practice, we may get tighter bounds 
if we measure “success of substituting k votes” instead of “violating y,_1”. 

Such types of attack are more interesting in terms of coercion. Suppose that 
the attacker already controls ng dishonest voters, and in addition, is able to 
manipulate £ ballots with a high probability of success. If ng +£ < 5, then it 
is not enough to switch the election result and make a certain candidate 7 the 
winner. The attacker tries to convince k = (np, — ¢)/2 voters to vote for j. If in 
the end, j is not the winner, the attacker learns that at least some voters of the 
coerced group have not obeyed, and may punish them. 
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4.4 Verifiability and Coercion-Resistance 


Suppose that the attacker is trying to convince a subset of k voters to misbehave. 
It can be viewed as a variant of coercing abstention from voting (since bad votes 
are not supposed to be counted), or even an attempt to halt the elections, in 
the case when Judge’s rejection does not allow proceeding with publishing the 
result. Such kind of attacks, called fault attacks, have been considered in [9], 
and the attacker can apply them to test the loyalty of a voter (or a subset of 
voters) in a probabilistic way. The following definition allows the attacker to 
break k-correctness by taking control of a certain number of dishonest voters. 


Definition 16 (ballot-corruptible protocol). An instance mp of a voting 
protocol P is called ballot-corruptible if, for all k € N, there exists a subset of 
voters V’ := {uj1,..., vie} of size L< k+ 1, and a joint strategy bad for these £ 
voters, such that 

Pr[(mp\y ||bad)™ E ~y] = 1 


where yx, is defined as in Definition 7. 


We could quantify the probability in Definition 16 as 6, introducing an extra 
parameter into relations between coercion-resistance and verifiability. 

Definition 16 allows the attacker to interact with the protocol in such a way 
that y% will actually be violated and the judging procedure triggered. In practice, 
the bad voting strategy may correspond to submitting corrupted paper ballots, 
or malformed digital ballots that e.g. encode several votes in a single ballot. 
In practice, < k + 1 voters can be sufficient to break y,z-correctness, e.g. by 
submitting multiple votes in a single corrupted ballot. 

The following theorem estimates a relation between verifiability and coercion- 
resistance for ballot-corruptible protocols. The idea is that, even if the corrupted 
final result is not published, the fact that the cheating was detected may already 
leak something. Since the Judge’s decision cannot leak more than a single bit, 
the attacker needs to encode information into that bit in such a way that it tells 
whether the inputs of the victim voter(s) are @ or d. 


Theorem 4. Let mp be an instance of ballot-corruptible voting protocol P with 
np honest voters. Then the following statements cannot be true at once: 


- mp is (k,d)-coercion-resistant (Definition 12) against an attacker who has 
access to Out(J); 

- The instance wp: of P with np — k honest voters is (k — 1,1 — 6)-verifiable 
(Definition 8). 


Proof (Sketch). Let V’ be the k voters of Tp to be coerced. Consider the protocol 
instance mp; where V’ are treated as corrupted. Let ma; be an adversary who 
sends corrupt message to V’ and follows the strategy bad on their behalf, but 
does not corrupt any other agents. Let 74 be an adversary that behaves similarly 
to ma, except that it does not send corrupt message to V’, but is just connected 
to them via the interface of dum. Such 7,4 satisfies Definition 12. The processes 
ta‘||tp: and ma||dum||7p\y, differ only in the interface between the protocol 
and the adversary, but the output of J is the same in these processes. 
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— If the voters V’ obey the attacker in 7,||7p, they follow the strategy dum, 
and since P is ballot-corruptible, the goal y,—1 will be violated. Since mp» is 
(k — 1,1 — 0)-verifiable, the Judge will accept with probability at most 1 — ô 
in my ||Tp, and hence also in 7||dum||mpyy-. 

— While the definition of coercion-resistance does not prohibit that the counter- 
strategy may violate y,_1, it is reasonable to assume that the goal of the 
coerced voters is that the elections end up successfully and the Judge will 
accept. Hence, if the voters V’ do not obey the attacker, the Judge will accept 
with a probability 1. 


The difference between the probabilities of Judge accepting is at least 6. The 
attacker outputs 1 iff the Judge accepts, breaking (k, 6)-coercion-resistance. 


In practice, Theorem 4 could be applied by an attacker who coerces k voters 
to put corrupted ballots into the ballot box. The attacker then looks into the 
ballot box and sees whether it contains at least k corrupted ballots. In the real 
world, however, it is not excluded that the “bad” vote can occasionally be cast 
as well by voters who are not controlled by the attacker, even though it is not 
intended behaviour. Such voters add certain randomness to the experiment. 

If the voting protocol is accountable, the coerced voters might not want 
that the Judge would accuse them of misbehaviour, so they might not agree 
to follow the strategy bad unless the attacker threatens them by a more severe 
punishment than the Judge. However, accountability may in turn provide other 
means of coercion, as discussed in the following section. 


4.5 Privacy and Accountability 


If the Judge’s verdict is independent of the choices of honest participants, it will 
not harm the privacy of an honest voter in any way. However, as shown in [2], if 
we want to get a stronger kind of accountability (the individual accountability) 
that allows pinpointing the cheater directly, we may need stronger assumptions. 
In order to resolve all possible disputes between a voter v; and a non-voter agent 
a (such as a voting machine), we need to either assume a semi-trusted a (who 
processes all received ballots honestly), or the existence of reliable and/or unde- 
niable channels between the voter and the machine, such as voting authorities 
who actually saw that the voter indeed has interacted with the machine. While 
an undeniable channel does not leak the exact choice of a voter, it would still at 
least leak the fact that a voter has voted. Let us formally define an accountability 
property ® that does not threaten the privacy of honest voters. 


Definition 17 (safe-evidence accountability property). Let P be a voting 
protocol instantiation. Let 6 € [0,1] be the tolerance. Let Tz z and Tz be defined 
as in Definition 11. We say that the accountability property ® = (a, ¢1,..., be) 
of P w.r.t. a Judge J € X is (k,6)-safe-evidence if 


M Ad|aj: mrs (J: o;)] —Pr[ro? 41 | aj: Tr (J: ġ;)] 


ig 1,0 
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is -bounded as a function of the security parameter ņ for all indices of honest 
voters i, choices ¢,c’ and for all adversary processes 74 that have access to the 
chaninéls In(J). 


Definition 17 says that the evidence for a verdict, based on all inputs that J 
has received through the channels In(J), does not depend (much) on the choices 
of honest voters. The condition Ji : m — (J: ¢;) ensures that we only consider 
protocol runs where the Judge has actually made a verdict, which excludes 
possible attacks that come due to failure of accountability, e.g. leakage via the 
final result. The definition allows an arbitrary property a. 

In order to break privacy, the attacker should first of all be able to violate the 
condition a, so that the judging procedure would be triggered. Then, in order 
that the Judge would learn anything interesting, the evidence should depend 
on the vote of an honest voter under observation, at least telling whether the 
voter has voted or abstained from voting. The following definition characterizes 
protocols for which accountability has a direct impact on privacy. 


Definition 18 (unsafe accountability property). Let mp be an instance of 
a voting protocol P, X the agents of P, B = (Yk, ¢1,---,¢¢) an accountability 
property, and J € X the Judge. The property ® is called unsafe in tp w.r.t J if 
there exists an adversary 74 such that: 


1. Pr[(rp||r4)™ E =y] = 1. 

2. There is a choice c € C such that, in every run r ofn satisfying Ji: (J: Qi), 
there is a subset i, of k+1 honest voters (which can be different in each run) 
such that (rp||r4) outputs a boolean value voted(i, c) for alli € i, to In(J). 


Intuitively, the second point of Definition 18 says that, whenever the Judge 
makes a verdict, he learns something about a subset of voters somehow involved 
in a dispute. The parameter k could be e.g. the minimal number of complaints 
required to start the dispute resolution procedure. A particular example of an 
unsafe accountability property would be individual accountability that relies on 
undeniable channels, assuming that the Judge makes the verdict based on access 
to these channels. In that case, c would be an abstention vote. Let us show how 
Definition 18. is related to Definition 17. 


Proposition 3. Let tp be an instance of a voting protocol P with np, honest 
voters. Let X be the agents of P, B = (yx, 1,.--, be) an accountability property, 
and J € X the Judge. Let ® be unsafe in tp w.r.t J. Then, ® is not (k, ô)-safe- 


evidence w.r.t. J and ta for any < 1— Te (1 = 


As) and any n. 

Proof (Sketch). Let 74 be an adversary that satisfies Definition 18. Consider 
the runs of (rp||m4) that satisfy Ji : (J : ¢;). In each such run r, there is 
a subset i, of k’ voters such that messages voted(i,c) are sent to a channel of 
In(.J) for all i € i,. The idea is that the same attacker 74 chooses Z= (c,..-,¢) 
and Ë = (c’,...,¢’) for c# c to break the safe-evidence property. However, the 
problems is that ip can be different in each run, but we need a single i for all 
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runs. The simplest solution would be to take k’ = np — k, when any subset of 
size k’ +1 always covers at least one victim. However, we can do better since the 
adversary may choose the 7 itself. In the worst case (from attacker perspective), 
no subset of voters is preferable, and all voters are equally likely to be exposed 
to In(J). The probability that all k’ + 1 leaked votes are “not interesting” is 


a p k' „—k—j k’ 
(rch) /(etty), which equals TI} o B = Mio (1-5) 


The following theorem estimates the relation between privacy and account- 
ability for an unsafe accountability property. 


Theorem 5. Let tp be an instance of a voting protocol P with np honest voters. 
Let X be the agents of P. Let ® = (yk, Q1, ..., 9e) and J E€ X be such that ® is 
unsafe in np w.r.t. J. Then the following statements cannot be true at once: 

- tp is (k,6)-private (Definition 11); 

- tp is (k',1 — ô/ (1-1 (1- k ) -accountable w.r.t. B, J (Defini- 


j=0 Nnh—j 
tion 10). 


Proof (Sketch). Assume that 7p is (k, dacc)-accountable. The condition Ji : m —> 
(J : di) Va E yw is satisfied with probability at least 1 — dace. Since ® is 
by assumption unsafe in mp w.r.t. J, there exists an adversary 7,4 such that 
Pr[(ral|ap)™ E ay] = 1, so Ji: m (J : ¢;) is satisfied with probability at 
least 1—dacc. Assume that @ is (k, dey )-safe-evidence w.r.t. J and 74. The success 
of 4 in distinguishing whether the voters i have voted or not equals dey (1—dace). 
Assuming that the protocol is (k, ôpr)-private, we have dey - (1 — dace) < Spr, SO 
dev < Opr/(1— dace). Now, since & is unsafe w.r.t. J, by Proposition 3, it can only 


be (k, dev )-safe-evidence w.r.t. J for dey > 1— i (1 — as), which gives us 


j 
dace > 1 — ôpr/ (1 — e (1 Sal )). and any smaller dace is not suitable. 


Nh—-J 


In practice, Theorem 5 could be applied by an attacker who takes control 
over a voting machine that issues receipts for later verification, such as Wom- 
bat [1], ThreeBallot, and VAV [22]. The idea is that the corrupted machine will 
nicely output to all voters appropriate receipts. However, it excludes at least k 
ballots when displaying information on the bulletin board. With probability at 
most dace, the attack will not be detected, and the Judge does not do anything. 
Otherwise, there are several outcomes possible. 


— The cheating is detected directly by auditors. 
— Sufficiently many voters complain after looking at the bulletin board. 


In the first case, the Judge does not learn anything interesting from the 
evidence. In the second case, a subset of voters whose ballots have been dropped 
come to complain, and the attacker who has corrupted the voting machine can 
now match the complainer’s identity with an affected ballot. If the ballots are 
not encrypted, the attacker will not only detect that the voter has voted, but 
also match the corrupted ballot to the complainer’s identity and learn the vote. 
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5 Conclusions and Future Work 


In this paper, we have proposed a selection of quantitative definitions of privacy, 
verifiability, coercion-resistance, and accountability, which are adapted versions 
of the definitions of the KTV framework. We have shown how these metrics are 
related to each other, exploring some generic relations that do not depend on 
the actual distribution of votes. In practice, the quantitative degree of privacy 
of voting protocols strongly depends on the way in which the voters make their 
choices. As the next step, it will be natural to analyse particular distributions. 

Assuming that the votes are independent, the privacy definition that we have 
considered in this paper can be viewed as a variant of distributional differential 
privacy (DDP), albeit DDP estimates the ratio of probabilities instead of the 
difference. Related work [18] has estimated DDP bounds for various voting rules, 
and we could study how their definitions of privacy can be combined with veri- 
fiability and accountability of the KTV framework. 
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Abstract. Deception is a form of active defense that aims to confuse 
and divert attackers who try to tamper with a system. Deceptive tech- 
niques have been proposed for web application security, in particular, to 
enrich a given application with deceptive elements such as honey cook- 
ies, HTTP parameters or HTML comments. Previous studies describe 
how to automatically add and remove such elements into the application 
traffic, however, the elements themselves need to be decided manually, 
which is a tedious task (especially for large-scale applications) and makes 
the adoption of deception more cumbersome. 

In this paper, we aim to automate the generation of deceptive HTTP 
parameter names for a given web application. Such parameters should 
seamlessly blend into application context and be indistinguishable from 
the rest of the parameters, in order to maximize the deception effect. To 
achieve this, we propose to use word embeddings trained with a domain- 
specific corpus obtained from existing web application source code. We 
evaluate our method through a survey, where we ask the participants to 
identify the deceptive parameters in two different web applications’ APIs. 
Moreover, the survey is composed of two variants in order to further 
experiment with the impact of the quantity and enticement of deceptive 
parameters. 

The results confirm the effectiveness of our method in generating indis- 
tinguishable honey parameter names. We also find that the participants’ 
expectation of the ratio of honey parameters remains constant, regard- 
less of the actual number. Thus, a higher number of honeytokens can 
provide a stronger defense. Moreover, making attackers aware of decep- 
tion can help to obfuscate the real attack surface, e.g., by masquerading 
more than 10% of the real application elements to look like traps. Finally, 
although our work focuses on the generation of parameter names, we also 
discuss other related challenges in a holistic way, and provide multiple 
directions for future research. 


Keywords: Web application security © Deception - Active defense 


1 Introduction 


As part of a defense-in-depth strategy, deception works by confusing and mis- 
leading the adversary with false information, while masking the real nature of a 
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system, or repackaging it to look like something else [14,16]. Various studies have 
shown that deception can be an effective defense mechanism, not only for attack 
detection [17,34,47], but also for impeding the attack progress and disrupting 
attackers’ emotional and cognitive state in various ways [26,28,29]. Moreover, 
deception technology market has been growing in recent years [6,7], with sev- 
eral commercial solutions providing data, network, application or endpoint layer 
deception [18,37,50]. 

The focus of this study is on web application layer deception. So far, the main 
idea has been to augment the application with deceptive elements (also called 
honeytokens, which can be in the form of HTTP parameters, cookies, HTML 
elements, permissions, or user accounts) in order to showcase a fake attack sur- 
face [30-32,34,35,42,45]. Monitoring the modifications to the values of such 
deceptive elements allows to detect attackers who are tampering with the appli- 
cation in order to find vulnerabilities. For instance, a common attack vector is 
called web parameter tampering, where the attacker manipulates the application 
parameters exchanged between the server and client, in an attempt to modify 
privileges, get access to unauthorized information, exploit business logic vulner- 
abilities, or disrupt the integrity of the application data [20,41]. The attacker 
may tamper with an object ID in the URL parameter to exploit an improper 
access control mechanism (known as the Insecure Direct Object Reference vul- 
nerability [46]); or try to modify, e.g., the price of a product sent in a hidden 
form field, which was assumed to be immutable by the developer [41]. The use 
of deceptive elements provide a reliable source of warning in such cases, as the 
regular users of the application are not likely to intercept the communication 
and try to tamper with application data. 

Most of the previous work on application layer deception focuses on how to 
add the deceptive elements with minimal effort. They use a reverse-proxy in 
front of the application that adds and removes the deceptive elements on the fly, 
seamlessly, so that the application itself will not require any modifications [30, 
32,34]. Previous work also conducts CTF based experiments to measure the 
effectiveness of application layer deception [34], including when the attackers 
are aware of the presence of deception [47]. 

While these studies focus on automating the injection of deceptive elements 
into the application, they do not really address the challenges related to the 
generation of such elements, leaving this as an open research problem. In fact, a 
survey on deception techniques in computer security [35] draws attention to the 
lack of proper honey-token generation strategies for web applications and cloud 
images. Other studies emphasize the need to create “content-oriented deceptions 
to deceive skilled attackers in the long term” [26] and draw attention to the dif- 
ficulty of creating such context-specific elements [34]. Previous work also finds 
that deceptive elements should be well intertwined with the application function- 
ality and logic, to be robust against the deception awareness of the attacker [47]. 
In this paper, we address this research area of automatically generating realistic 
deceptive elements for web applications. In particular, we focus on the automated 
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generation of deceptive HTTP parameters, as they can be effectively used in every 
API endpoint, covering a large attack surface of parameter tampering. 

Deceptive HTTP parameters can be any type of HTTP parameter (such as 
the query, path, body or form parameters). However, coming up with context- 
specific deceptive parameters and embedding them into the application seam- 
lessly accompany multiple challenges: 


— How to choose realistic names for the parameters? 

— How to make sure that the parameters are enticing enough? 

— How to assign them plausible values? 

— Where to place the parameters within an API? 

— What is the optimal number of deceptive parameters? 

— What should be the proper response when a certain parameter is tampered 
with? 


We focus on the first challenge, which is to automatically generate plausible 
deceptive parameter names that are difficult to distinguish from the real param- 
eters. For this, we implement a machine learning method to generate parameter 
names that will blend well into the context of a given application (Sect.3). In 
particular, we use word embeddings (a Natural Language Processing technique) 
trained with the source code of publicly available web applications. 

Then we evaluate the effectiveness of our method via a questionnaire with 42 
participants (Sect.4). We ask the participants to identify the deceptive parame- 
ters in two different web applications’ APIs. Our questionnaire also experiments 
with two additional challenges: the amount and enticement of deceptive param- 
eters (Sect. 5). 

In addition to showing that our method successfully generates indistinguish- 
able parameter names, we make several other observations: First, we find that 
the participants anticipate a certain ratio of parameters to be deceptive, regard- 
less of the actual quantity of deceptive elements. Thus, adding a larger number 
of deceptive elements would mean that more of them will go undetected. Second, 
the addition of very obvious (conspicuous) deceptive parameters does not really 
help to hide the existence of realistic ones. Third, we find that the participants 
mislabel at least 10% of genuine parameters as deceptive, on average. This pro- 
vides another evidence on the benefit of informing attackers about the use of 
deception. 


2 Method 


As mentioned earlier, we aim to automate the generation of deceptive HTTP 
parameters that are in agreement with the context of the web application to 
be protected. For such tasks, the Natural Language Processing (NLP) domain 
offers different techniques, such as specialized lists, lexical dictionaries and word 
embeddings. 

Specialized lists have the drawback of needing to be handcrafted, a process 
which can be time-consuming and requires domain-specific knowledge. Lexical 
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dictionaries have been used in traditional NLP approaches. They are networks 
of meaningfully and semantically related words and concepts (synsets) and pro- 
vide graph representations of the relationships of a vocabulary. Nevertheless, 
lexical dictionaries might not be able to keep up with the quick evolution of the 
language, as well as with domain-specific jargon. 

Finally, embeddings are vectorial representations of words mapped onto a 
reduced dimensionality space where similar words (embeddings) are close to 
each other. The distance between these embeddings is often measured using the 
cosine similarity or any other distance between vectors. Due to their data-driven 
nature, embeddings are able to capture the relationships between words in spe- 
cific contexts. They were initially popularized in the recent years due to their 
applications in NLP by using rather simple neural network architectures, such 
as the ones proposed by word2vec [38] and GloVe [43]. More complex language 
models have been developed in the last years (e.g. ELMo [44], BERT [27]). Nev- 
ertheless, these models are often trained on vast English text corpus coming 
from natural language sources as varied as news articles, Wikipedia entries, lit- 
erary and web content, among others. While they are useful for generic language 
understanding tasks, they can struggle with applications which contain a big 
domain-specific vocabulary. 

Outside the realm of natural languages, embeddings have also been used to 
model programming languages both in a sequence-of-tokens fashion (supported 
by the naturalness hypothesis [13]) or by embedding elements in graph represen- 
tations of code (e.g., abstract syntax trees or control flow graphs) [15,23,25]. 

In this paper, we propose the use of embeddings of source code by treating 
it in a sequence-of-tokens fashion. The choice was made to use the smaller and 
simpler models like word2vec due to their less data-hungry nature and their 
ease to train them compared to more complex models. Due to the very specific 
nature of our application, we train our language model with a dataset specifically 
created for this task. 


2.1 Data Collection and Training 


In order to create a domain-specific dataset that will capture the terminology and 
technical context of web applications, we use the source code of web applications 
available at public GitHub repositories. We start with a list of public GitHub 
Java repositories with more than 5 stars (watchers), which was made available 
by Chen et al. [24] and includes 83,082 repository URLs collected from GHTor- 
rent [33] database (last updated on 2019-06-01). Among these, we remove the 
repositories that include android or mobile keywords in the repository name, 
and focus on the repositories with at least 15 stars, which reduces the list to 
38,376 URLs. 

As our purpose is to generate HTTP parameters for web APIs, we try to limit 
the training data to the repositories with web application relevant source code. 
We do this in a coarse-grained way, by pruning the dataset to only contain the 
repositories that include web related libraries: We download each repository and 
look for “import” statements for library names such as org. springframework. 
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web, javax.servlet, org.apache.http, httpcomponents and okhttpclient. 
Finally, we end up with the source code from 10,324 repositories, which corre- 
sponds to 4,002,776 Java files. 

We further refine the Java files according to their name: The files that are 
likely to not have a context related to the functionality of the application (e.g., 
util, filter, exception, config, parser, test) and the files that might 
have a too specific context (e.g., coin, blockchain, droid, Activity) are 
removed. 

For each project, for each remaining Java file, we parse the file (using the 
JavaLangParser Python library) to extract the relevant input to train the 
wordZvec model. While word2vec is normally trained with sentences from natu- 
ral language, we construct the sentences as sequence-of-tokens collected from the 
source code. In particular, each of the following items forms a separate sentence 
by appending the relevant tokens together: 


— Each method name (MethodDeclaration) and the names of method parame- 
ters 

— Each class constructor (ConstructorDeclaration) and the names of construc- 
tor parameters 

— The names of the class fields (FieldDeclaration) 

— All the variable names in the class (VariableDeclaration) 


The motivation is that each of these sentences includes tokens that are likely 
to belong to the same context. Note that, each token (method, parameter, vari- 
able, or constructor name) is split by underscore or camel case (if such naming 
convention was used), and then converted to lower case. 


Post-processing: In each sentence, we remove the tokens that are specific to the 
Java language, and tokens that do not carry any contextual meaning.! Finally, 
we train the word2vec model using the Python gensim.models library, with the 
default parameters. 

Independently from this process, we also save all the variable names with 
built-in types (e.g., String, boolean, int, array) for each project in a separate csv 
file. This corresponds to 8,844,562 variables. We later use this data to find the 
most suitable parameter type for the generated deceptive parameter names. 


2.2 Generation of Parameter Names 


To generate deceptive parameters for a target application, we assume to obtain 
the API specification of the application to start with. In particular, we assume to 
have an OpenAPI specification [40] as input. OpenAPI (formerly known as Swag- 
ger [49]) specification aims to standardize the descriptions of RESTful APIs. In 
addition, the Swagger project provides various tools for testing and development, 


1 These words are has, have, init, start, stop, get, set, main, create, 
delete, update, read, add, remove, is, on, by, to, test, parse, write, 
initialize, string, int, boolean, char. 
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together with a specific user interface to view and try out the API (called Swag- 
ger UI [12]). In our experiments, we use an alternative Swagger user interface 
called Bootprint [8], as it outputs a static HTML page with a simpler design 
that is more appropriate for our purpose. 

Once we have the Swagger specification (which is often in json or yaml for- 
mat), we flatten [48] the file and convert it to the csv format, where we have 
each HTTP parameter and the related information (endpoint, HTTP method, 
name and type) in one row. Note that, endpoints may pair with multiple differ- 
ent HTTP methods, and each endpoint-method pair is likely to have multiple 
parameters. Figure 1 shows an example API specification of an endpoint-method 
pair in json format (a), together with how it looks on Bootprint-Swagger UI (b) 
and our conversion to csv format (c). 


"/carts/{id}/entries": { 
"post ": { 

"operationId": 
postCartEntry", 

"parameters": [{ 
Ntypet: < aE Ma POST Icarts/{id}/entries 

name": "id", 
: "in": "path"} n 
N: Data t 
"type": "string" = il si 
"name": " i 

productVariantId", 
in": "formData"}, productVariantid formData string 


id path string 


"type": "integer", quantity formData integer (int32 
"name": "quantity", 
"in": "formData" 


$133 (b) Swagger UI 


(a) Swagger json file for the endpoint- 
method pair 


Endpoint Method|Location |Name Type | 
/carts/id/entries}post [path id string | 
/carts/id/entries|post [|formData|productVariantId|string | 
/carts/id/entriesļpost — |formData|quantity integer] 


(c) API endpoint converted to csv 


Fig. 1. Example Swagger input for the POST method of /carts/{id}/entries endpoint. 


Next, we use our word2vec model to generate deceptive elements for the 
endpoint-method pairs in the API. In particular, we use the most_similar () 
method of word2vec library to get the top n words that are the most similar to 
a list of existing elements. We form the existing elements list depending on the 
location of the HTTP parameter: 
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— Path parameters: Path parameters are located in the URL path of an end- 
point, and often point to a specific resource [9]. We only attempt to embed a 
deceptive path parameter to the endpoints that already have at least one path 
parameter. First, we group the endpoints up to their very first path parame- 
ter. Next, within each group, we collect all the endpoint URLs, split the words 
by underscore or camel case if necessary, and form our existing elements list. 
If this approach does not yield any proper output (due to the thresholds that 
we will explain later), an alternative approach is to collect the first level URL 
components of all endpoints as the existing elements list. Note that, the gen- 
erated deceptive path parameter will be added to all the endpoints in this 
group, to keep the consistency of parameters between endpoints. 

— Query or body parameters: Query parameters are located at the end of 
the URL, after a question mark (e.g., ‘?namel=valuel &name2=value2’ for- 
mat). Describing the body parameters, on the other hand, is a bit more com- 
plex: The earlier version of OpenAPI (v2) differentiates between the formData 
parameters that describe the payload of a request, and the body parameters 
that describe an object with a data structure [10]. However, the last version 
(OpenAPI v3) categorizes both of them under the RequestBody type [11]. 
We process the query and body parameters, for each of the endpoint-method 
pairs: We take the existing query or body parameter names, in addition to 
the tokens from the URL path of the related endpoint (again splitted by 
underscore or camel case, if necessary). Note that, as our word2vec model is 
deterministic once it is trained, it will generate the same output for the same 
set of existing elements. This allows us to preserve the consistency between dif- 
ferent endpoints: For instance, two different endpoints that update an address 
object with the same body parameters will also be assigned the same decep- 
tive parameter. 


Finally, our algorithm aims to insert deceptive parameters only if it has a high 
‘confidence’ that the generated element will fit in the context of the endpoint- 
method pair. For this, we implement the following four steps: 


(i) Making sure that the existing elements list contains sufficient input: For 
query and body/form parameters, we set a threshold for the minimum num- 
ber of existing elements: If there are fewer elements than this threshold, we 
choose to not generate a deceptive parameter for the given endpoint-method 
pair and parameter location. For path parameters, we also require a certain 
number of endpoints per group, to be able to assign a deceptive parameter 
to this group. 

(ii) Making sure that the input words are known to the model: It is possible 
that some of the existing elements will not be present in the vocabulary of 
our word2vec model, as our training dataset may not contain them. Thus, 
our second threshold becomes the minimum known_words_ratio: the ratio of 
existing elements that are present in the vocabulary. If these two thresholds 
are met, we take the top n most similar words as our candidate deceptive 
parameters. 
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(iii) Post-processing to check if the candidate parameters have sufficient sim- 
ilarity to existing elements: We compute the average similarity score of 
candidate parameters to the existing elements. (The similarity scores are 
returned by the most_similar() method.) If this value is less than our 
average_similarity_score threshold, we choose to not insert any deceptive 
parameter for this endpoint-method pair and parameter location. 

(iv) Post-processing to avoid repeating parameters: Finally, we remove the can- 
didate deceptive parameters that are morphologically too close to any 
of the existing elements. For example if “paymentid” is an existing ele- 
ment and our model generates “paymentnum”, we remove “paymentnum” 
from the candidate list. For this, we use the ratio() method from the 
difflib.SequenceMatcher [3] class in Python, to compute a measure of 
similarity between two sequences. We set a sequence_matching_score thresh- 
old to decide whether a candidate should be removed. After all these steps, 
the final deceptive parameter becomes the first element in the candidate 
deceptive elements list, having the highest similarity score value. 


Fine Tuning the Algorithm: We tried our algorithm on 17 real-world Swag- 
ger API documentations that we collected online (using Google dorks such as 
intitle: “swagger.json” site:github.com). Note that, we make sure that the col- 
lected APIs do not overlap with the GitHub repositories used in our training. 
By experimenting with these APIs to generate deceptive elements, we come up 
with a set of threshold values that provide a good starting point. Table 1 gives 
these threshold values, which we also use for evaluating the performance of our 
method in the next section. 

On a final note, we also assign a type (e.g., int, boolean, string) to the 
generated deceptive elements using the dataset of more than 8 Million variables 
collected in Sect. 2.1. To infer a type, we first look for an exact match between 
the generated parameter name and the variable names dataset. If it does not 
exist, we again use the SequenceMatcher class with a similarity score threshold 
of 0.8. This algorithm was able to infer a type for almost all of the parameter 
names in our initial experiments. 


Table 1. Thresholds and values that affect the generation of parameters. 


Threshold Value Threshold Value 
n 5 known_words_ratio >0.7 
number of endpoints >2 average_similarity_score >0.6 
number of existing elements >2 sequence_matching-score >0.5 


3 Evaluation 


In this section we aim to evaluate the performance of our method in generat- 
ing indistinguishable deceptive parameters. A common evaluation method that 
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is also used in previous work [21,45] is to ask human subjects to differenti- 
ate deceptive elements from genuine elements. However, while in the previous 
work the human subjects were informed upfront that 50% of elements they will 
evaluate are deceptive, we do not give any tips about the number of deceptive 
parameters. Moreover, we present the subjects with real-world APIs using the 
Bootprint Swagger UI, so that they can get a sense of the application and thor- 
oughly observe all the endpoint-method pairs, parameters and their types. We 
use a separate questionnaire where we list all the distinct parameter names (cat- 
egorized by location such as query, form, path), and ask the subjects to mark 
the parameters that they think are deceptive. 

Although this evaluation method does not allow participants to interact with 
a running instance of the application, it allows them to really focus on the names 
of the parameters. In fact if they were able to interact with the application, 
they could rely on additional criteria (e.g. value of the parameter, response to 
tampering) to decide if a parameter is deceptive or not. Thus, presenting the 
participants with a static API specification better fits our purpose of evaluating 
the indistinguishability of parameter names. 


3.1 Preparation of the API Specifications 


For this experiment, we choose two APIs among the set of 17 real-world APIs 
mentioned in the previous section, following the below criteria: 


— The APIs should have more or less equal number of endpoints and parameters 
to achieve more reliable results in statistical tests. 

— The applications’ context should be easy to grasp so that the participants 
can make more informed decisions (i.e., reducing the randomness that might 
emerge from not understanding the API). 

— The number of API parameters should be reasonable for manual evaluation; 
to not overwhelm and distract the human subjects, and to make the survey 
feasible to complete in a reasonable amount of time. 


In particular, the two APIs we choose include (i) a cloud integration API 
for an e-commerce application [1], and (ii) a community based laboratory plat- 
form for various professions [4]. The first one has 63 distinct parameters and 38 
endpoint-method pairs, and the second one has 74 distinct parameters and 33 
endpoint-method pairs. 

To prepare the APIs for the experiment, we first anonymize the specification, 
removing the application name, all descriptions, and fields ignored by our study 
such as response status’. Then we generate the deceptive parameters using the 
method described in previous section. We insert the generated parameters back 
into the Swagger specification, so that the Bootprint Swagger UI can display 


? Full list of fields removed from Swagger: “info”, “description”, “host”, “tags”, “sum- 


mary”, “responses”, “definitions”, “enum” , “example”, “security”, “securityDefini- 
tions”, “x-example”, “minimum” , “maximum”, “readOnly”, “maxLength”, “min- 
Length” , “pattern”, “required”. 
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them. Our algorithm generates 8 deceptive parameters for e-commerce and 9 for 
the laboratory platform. We will call this the default mode, and will denote the 
APIs as E-Commerce p and Lab-Platformp, respectively. 

In addition, our experiment also aims to measure the effect of (i) high quan- 
tity and (ii) conspicuous (i.e., easily visible, obvious, attracting attention [22]) 
deceptive parameters. For this, we decide to divide the participants into two 
groups: Each group is presented 2 applications, one of it with the default mode, 
and the other with one of the additional characteristics (either higher quantity 
of parameters, or very conspicuous parameters added). Table2 shows various 
statistics on the number of distinct deceptive parameters and affected endpoint- 
method pairs for the API variants used in our experiment. 


High Quantity of Deceptive Parameters: We apply this variant to the 
e-commerce application, denoted with E-Commerceg. To have a significantly 
higher number of deceptive parameters compared to the default mode (which 
is E-Commercep), we first use the additional results generated by our model 
(i.e., more parameter names from the candidate deceptive parameters list). With 
this, we obtain 6 more parameters in addition to the 8 parameters generated 
in default mode. However, while we want this API variant to have statistically 
significantly higher number of deceptive elements, our model was not able to 
generate that many parameter names, as we apply several thresholds to choose 
the best candidates. Thus, we have added 15 additional, manually chosen realistic 
parameters. 

To show that E-Commerceg has significantly more deceptive parameters 
compared to E-Commerce p, we employ two-proportions Z-tests: Looking at the 
ratio of (the number of distinct deceptive parameters) /(the total number of 
distinct parameters), we find a z-score of —3.0609 and p-value of .00222. Thus, 
the result is significant at a confidence level of 95%. Moreover, in terms of the 
ratio of (the number of endpoint-method pairs with deceptive elements) /(the 
total number of endpoint-method pairs), we also show a statistically significant 
difference (z-score = —1.9742, p-value = .02442, significant at p < .05). 


Conspicuousness of Deceptive Parameters: We use this API variant in the 
laboratory platform application, denoted as Lab-Platformc. To manually add 
conspicuous deceptive parameters to the API (in addition to the realistic ones), 
we use two different strategies: 


— Parameters that look too enticing and do not follow the naming conven- 
tion of the application (e.g., use of camelcase instead of underscore, upper- 
case letters): Examples are MakeAdmin, FullPrivileges, ADMIN_PERM, 
cl4ssifi3d ID. 

— Parameters that do not have any meaning or that are out-of-context of the 
application: Examples are yoyo, pysantx, vv, disclosed. 


To make sure that the parameters are indeed conspicuous, we made an initial 
evaluation on 7 participants, presenting them a preliminary version of the survey 
and asking them to mark the parameters that they think are deceptive. All 
participants marked the conspicuous parameters as deceptive. Note that these 
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participants who were involved in the initial evaluation were not invited to the 
real experiment. 


Table 2. Breakdown of the API variants: showing the distinct number of honeytokens, 
affected endpoint-method pairs, and their ratios to the total number of parameters and 
endpoint-method pairs. 


# distinct honeytokens — 

# endpoint-method pairs 

Path Query Form Body # dist. parameters: | # endpoint-method pairs: 

honeytokens/total with honeytokens/total 

E-Commercep |1 = 19 - 5374 2—2 8 / 71 (11%) 22 / 38 (58%) 
(Survey I) 
E-Commerceg |2 19 3 3 14 11 10-4 29 / 92 (32%) 30 / 38 (79%) 
(Survey II) 
Lab-Platformp |2 — 10 1 1 2—>3 4-8 9 / 83 (11%) 19 / 33 (57%) 
(Survey II) 
Lab-Platformc |5 10 1 1 4 3 7 8 17 / 91 (19%) 19 / 33 (57%) 
(Survey I) 


3.2 Preparation of the Surveys 


Our experiment consists of two survey versions. Survey I contains E-Commerce p 
and Lab-Platformc APIs, and Survey II contains E-Commerceg and Lab- 
Platformp. Note that participants were not aware that there were two different 
versions of the survey. We advertised the survey with a single URL that redirects 
to a different version of the survey each time it is requested. We changed the 
redirection rules from time to time, to ensure that both versions will have the 
same number of participants. 

Both surveys start with a section that describes the purpose of the survey. 
In particular, it states the following: 


In this experiment, you will be presented with 2 different application APIs 
that include a number of honey parameters. We will ask you to identify 
the parameters that you think are deceptive (i.e., if you were to attack this 
application, you would avoid tampering those parameters to avoid being 
detected). 


Although we anonymize the APIs beforehand, we inform the participants that 
they are real-world APIs, and ask them to not search for the original APIs online 
for the sake of the validity of the study. 

The first three questions of the survey aim to learn about participants’ profile 
(current job title) and their experience on information security and deception 
technology. Then we have a different section for each application, where we first 
give a link to the Bootprint Swagger UI of the API. We then ask participants 
to identify the purpose of the API, and to rate their overall understanding of 
the purpose of the endpoints. Finally, we list all the distinct parameter names 
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categorized by their location (path, query, form, body) and ask the participants 
to mark if it is deceptive or genuine. Note that, by default all answers are set to 
genuine, to save the participants from clicking too many times. 


3.3 Participants 


We used snowball sampling to reach security experts. We advertise the survey 
mainly in two communities: First, among the security researchers, experts and 
enthusiasts in a large software company and second, among the computer secu- 
rity PhD students of a graduate school. Additionally, we advertise it on social 
media (Twitter). 

Note that the survey description warns the participants about an estimated 
duration of 30min, which was determined during the initial evaluation phase. 
Participation is completely on a voluntary basis, without any compensation. 
Overall, our advertisement is estimated to reach at least a few hundred people 
and the survey received answers between April 6 and May 24, 2021. 


4 Results 


We received 42 responses, which correspond to 21 participants for each version 
of the survey (Survey I & II). This number of responses allows us to show the 
effectiveness of our method and to make interesting observations. 


4.1 Participants’ Profile 


Majority of participants consist of software/web developers (19%), security 
researchers working in industry (17%) and MSc students doing internships 
(17%). Moreover, some PhD students (11%), postdocs, and professors (10%) 
have also answered the survey. 5 participants did not answer the question about 
their job title. Participants rate their information security experience as 3.5 + 1.1 
on a scale from 1 to 5. Moreover, they rate their knowledge on deception technol- 
ogy as 2.4+0.9. Overall, the participants seem to have an above average expe- 
rience in information security, and an average level of familiarity with deception 
technology. 


4.2 Participants’ Understanding of the APIs 


In a multi-choice question, we first ask participants to identify the purpose of the 
API. All participants correctly identified both the e-commerce and the labora- 
tory platform applications. Then, we ask participants to rate their understanding 
of the purpose of API endpoints on a scale from 1 to 5. Participants seem to have 
a good understanding of the e-commerce endpoints (on average 4+0.6 for E- 
Commerce p and 4+ 0.5 for E-Commerceg) and a fair to good understanding of 
the endpoints of the laboratory platform (on average 3.6 + 0.6 for Lab-Platform p 
and 3.4+0.9 for Lab-Platformc). 
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4.3 Indistinguishability of Deceptive Parameters 


To see if our method was able to generate deceptive parameters that are indistin- 
guishable from the genuine application parameters, we analyze the results from 
E-Commerce p (Survey I) and Lab-Platform p (Survey II). In particular, we com- 
pute a detection rate for each parameter, that is the ratio of participants that 
marked this parameter as deceptive. Then, we compare the detection ratios of 
the group of deceptive parameters and the group of genuine parameters: We use 
Welch’s t-test as the groups have unequal sample sizes and unequal variances [5]. 


Table 3. Detection rate statistics for deceptive and genuine parameters for the APIs 
in default mode. 


Detection rate statistics Welch’s t-test (p < .05) 
Deceptive Genuine 
E-Commerce p (Survey I) 24+13 18+14 t = 1, p=.34 (Not significant) 
Lab-Platformp (Survey II) 2622 12+10 t=1.8, p=.1 (Not significant) 


Table 3 shows the detection rate statistics and the results of Welch’s t-tests 
for both applications. Although detection rates of deceptive elements 
are slightly higher, we do not observe a statistically significant dif- 
ference in comparison to the detection rates of genuine parameters, 
at a confidence level of 95%. This shows that our method is able to generate 
indistinguishable deceptive parameters. 

Previous work suggests that a “high-quality honeytoken” cannot be distin- 
guished from a real token, even by the experts in relevant field [21]. Thus, the 
fact that 71% of our participants are security professionals or web developers 
increases confidence in the reliability of our results. 


4.4 Impact of the Quantity of Deceptive Parameters 


In this section we aim to answer the following question: Do participants mislabel 
(i.e., fail to detect) more deceptive parameters, when there are significantly more 
of them? 

For each participant, we compute the ratio of deceptive parameters that 
they correctly labeled. We also compute the ratio of parameters that they label 
as deceptive, among all parameters. This second metric refers to participants’ 
expectation (or assumption) on the quantity of deceptive elements. 
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Table 4. Average ratios of parameters that are labeled as deceptive, among (i) hon- 
eytokens and (ii) all parameters. 


Ratio of labeled honeytokens Ratio of labeled parameters in total 
E-Commerce p (Survey I) 23416 % 19+8 % 
E-Commerceg (Survey II) 25419 % 17413 % 


Table 4 shows that regardless of the quantity of honeytokens, par- 
ticipants label more or less the same percentage of honeytokens as 
deceptive (~23 to 25%). Thus, in E-Commerce p on average 6 of the 8 decep- 
tive parameters remain undetected by participants, while in E-Commerceg, 22 
of 29 are undetected. Moreover, participants’ expectation of the ratio of 
deceptive parameters (among all parameters) remains rather constant 
even when there are statistically significantly more honeytokens (~17 
to 19%). This means that, when there is a larger quantity of realistic deceptive 
elements, there will be more elements that will remain undetected. On the other 
hand, generating a very large number of realistic deceptive elements remains 
a challenge. As discussed in Sect. 3.1, we used a semi-manual approach, as our 
model was able to generate a limited number of high quality deceptive elements. 

Finally, we also compare the detection rates of deceptive parameters in E- 
Commerceg that were automatically generated by our model (14 parameters) 
and that were manually created by us (15 parameters), to see how our approach 
compares to manual selection. Average detection rate is found to be 21 +9% 
for automated honeytokens, and 23+11% for manual honeytokens. Applying 
a Welch’s t-test, we do not see a significant difference between detection rates 
(t = —0.44, p = .66). Thus, we can conclude that automatically generated 
parameter names were as realistic as manually selected ones. 


4.5 Impact of the Conspicuous Deceptive Parameters 


In this section we aim to answer the following question: Do participants misla- 
bel (i.e., fail to detect) more deceptive parameters, when there are some very 
obvious (conspicuous) honeytokens added as extra? The idea is that conspicuous 
honeytokens might help to hide realistic honeytokens, by attracting participants’ 
attention. 


Table 5. Average ratios of parameters that are labeled as deceptive, among (i) hon- 
eytokens and (ii) all parameters. 


Ratio of labeled honeytokens Ratio of labeled parameters in total 
Lab-Platformp (Survey II) 26416 % 13+10 % 


Lab-Platformc (Survey I) 25414 % 1247 % 
(excluding conspicuous) 


Generate Realistic HTTP Parameters for Application Layer Deception 351 


Table5 shows that, if we exclude the conspicuous parameters in Lab- 
Platformc, participants label more or less the same percentage of honeytokens 
as deceptive (~25 to 26%) in both Lab-Platformp and Lab-Platformc. More- 
over, participants’ expectation of the ratio of deceptive parameters again remains 
more or less constant (~12 to 13%). Thus, we do not observe any significant 
impact of adding conspicuous honeytokens on further disguising the 
realistic honeytokens. 

On the other hand, a significantly higher number of participants label the 
conspicuous honeytokens as deceptive (on average, 72+ 11%), in comparison to 
the realistic honeytokens (on average, 24+30%) in Lab-Platformc (Welch’s t- 
test: p = .001). Thus, we believe that conspicuous honeytokens can be used 
to tip off the attacker about the presence of deception, in order to enable 
the deception awareness effect that we will discuss next. 


4.6 Deception Awareness Effect 


In this section, we look at the ratio of genuine parameters that are labeled as 
deceptive by the participants. On average, E-Commercep and E-Commerceg 
have 18 +8% and 15+ 11% of the genuine parameters mislabeled, respectively. 
These ratios are 12 +10% and 11 +7% for Lab-Platformp and Lab-Platformc. 
Thus, we observe that at least 10% of genuine parameters were marked 
as deceptive across all APIs, which means that participants would avoid tam- 
pering with those parameters in an attack scenario. We can interpret this as the 
effect of deception awareness. Previous studies already observe various benefits 
of informing attackers about the presence of deception, such as compelling them 
to modify their attack behavior, impeding the attack progress, and deteriorating 
attackers’ cognitive and psychological state [29,47]. Our results demonstrate yet 
another benefit of deception awareness, that is, to masquerade the real applica- 
tion elements to look like traps. 


5 Limitations and Discussion 


Method: In this study we only considered the source code of Java web applica- 
tions from public GitHub repositories to train the model. However, it is possible 
to enrich the model with other codebases and projects using different web tech- 
nologies (e.g., PHP, Node.js). Note that, the number of high quality parameters 
that can be generated by the model depends on the richness of the training data. 
Another limitation of our approach is that it is not able to generate compound 
parameter names. This can be done as a manual post-processing step (e.g., by 
adding a common prefix or suffix to some parameter names), or it would require 
to train a model using the compound words as single words, if a proper training 
set is available. In addition, although we only used word2vec, combining it with 
other NLP approaches (discussed in Sect. 2) is also possible. 


Evaluation: The results of our evaluation survey only provide insights about 
whether the participants were able to distinguish between the generated param- 
eter names and the names of genuine application parameters. Thus, these results 
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should not be considered as a measurement of the effectiveness of deceptive ele- 
ments in attack detection. On the other hand, it is important to note that the 
deceptive parameters mainly aim to detect attacks via attackers’ interaction 
(e.g., tampering the parameter), as opposed to the traditional honeypots that 
aim to waste attackers’ time and resources. This means that, as soon as an 
attacker interacts with a deceptive parameter (e.g., with a fuzzing tool), he will 
be detected and the application will respond accordingly (e.g., by blocking the 
request or routing to a clone system). Thus, having realistic deceptive parame- 
ters becomes a first requirement to ensure the effectiveness of deception. 

As mentioned in Sect. 3.2, in the evaluation survey we only have deceptive 
and genuine options to choose between. Thus, participants are forced to make a 
choice even when they are not sure about the answer. In fact, we have received 
a few post-survey comments where the participants found some parameters to 
be implausible, but they were not sure if it was just due to bad API design 
practices, or due to deception. For instance, one participant stated that: 


Some of these APIs look off from a programming perspective. Why would 
you include < variable > as a query string when it might be more efficient 
to use it elsewhere? 


Another participant said: 


I would be extra careful in a situation like this and mark things [that 
maybe are not deceptive] as deceptive just in case. Taking into account 
that programmers are not perfect, they may create parameters that are not 
needed. So I think this is not needed, but is it because it is deceptive or it 
was done like this in reality... My general approach when doing tampering 
is, just touch what you are sure of. 


These comments imply a few points: First, it is likely that there will always be 
a suspicion about implausible-looking elements. Second, it is important to keep 
the coherence between API endpoints and imitate realistic functionality for the 
generated deceptive elements. Finally, we believe that obliging participants to 
take a decision is a more realistic approach, as in a real attack scenario they would 
need to make a decision to tamper or not. In fact, previous work observes via a 
CTF-based experiment that, although most participants are initially very careful 
to not touch the suspicious-looking elements, they give up on such precautions 
after some time, if they cannot find an attack vector to progress [47]. 


6 Related Work 


While there are many studies that aim to generate various deceptive content or 
honey elements, we focus on the ones that relate to web application security. 
HoneyGen [21] aims to create relational database with fake entries, based on 
the rules extracted from a real database. For evaluation, the method is applied 
on a database from a real-world dating website, to create fake profiles with 
different personal information attributes. The experiment involved 30 pairs of 
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profiles, each pair having one real and one fake persona. The 109 participants 
who joined the experiment were unable to distinguish the fake profiles that have 
high similarity to the real profiles. B.Hive [45] aims to generate honey form 
field names using a dataset of form fields collected from top websites. While 
this approach is limited to form parameters from pre-authentication pages, our 
approach targets all types of HTTP parameters and a wide range of application 
contexts. A more recent study [19] proposes to allow the user to enrich the UI of a 
web application with custom honey HTML elements (e.g., link, button, icon), via 
a browser extension. The idea is that the genuine users would be aware of these 
‘tripwires’ (and not interact with them), but an attacker could easily click on 
them once he gains access to the account. While the names of the honey HTML 
elements are ideally chosen by the user, authors also implement a suggestion 
tool based on a Markov model of URLs gathered from the Common Crawl [2] 
dataset. However, the paper does not provide an evaluation on the quality of the 
suggested names. 

BogusBiter [51] proposes to generate honey credentials that will be fed into 
phishing pages to conceal the real credentials of the user. The idea is to start with 
an initial set of credentials, and generate additional credentials by substituting 
certain characters of the username and password with different characters, each 
time. Other relevant studies propose different approaches for password guess- 
ing, based on a combination of specialized lists, lexical dictionaries and word 
embeddings [39] or deep learning techniques [36]. 


7 Conclusion 


This work automates the generation of realistic deceptive parameter names for 
different types of HTTP parameters. We demonstrate the effectiveness of our 
method via a survey based experiment, and find that the participants anticipate 
a certain ratio of elements to be deceptive, regardless of the actual quantity or 
enticement level of the honeytokens. Additionally, we observe that at least 10% 
of genuine API parameters were marked as deceptive by the participants, which 
demonstrates the potential benefit of informing the attackers about the presence 
of honeytokens. Finally, we provide various directions for future work by looking 
into the challenges that needs to be addressed for a complete automation of API 
layer deception. 
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Abstract. Industrial Control System (ICS) protocols are essential to 
establish communications between system components. Recent cyber- 
attacks have shown that the vulnerabilities in ICS protocols pose enor- 
mous threats to ICS security. However, the efficiency of traditional black- 
box fuzzing technique is constrained when the protocol specifications are 
not publicly available. 

In this paper, we introduce ICS Protocol Specification Extraction 
(IPSpex) method to improve black-box fuzzing efficiency via analyzing 
the network packet construction in industrial software. We extract mes- 
sage field semantics from network traffic, collect execution traces from 
network packet construction and extract message format using backward 
data flow tracking and sequence alignment algorithms. Our evaluation 
shows that compared to Wireshark, IPSpex achieves high correctness 
and perfection on three common ICS protocols, including Modbus/TCP, 
S7Comm and FINS. We further combine IPSpex with boofuzz to test an 
undocumented ICS protocol, UMAS. Totally we have found five 1-day 
vulnerabilities and two 0-day vulnerabilities. 


Keywords: ICS protocol reverse engineering - Memory trace - 
Black-box fuzzing 


1 Introduction 


Industrial Control System (ICS) are systems used to monitor and control indus- 
trial real-time processes in critical national infrastructures, including power grid, 
water treatment and chemical industry. Stability and reliability are of paramount 
importance to ICS. With the ongoing convergence between Operational Technol- 
ogy (OT) and Information Technology (IT), the emerging cyber-attacks against 
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industrial components have become potential threats to ICS, which can result in 
significant economic impacts, severe environmental disasters and even casualties 
[2,3,9, 11]. 

Over the past few years, the extensive application of Ethernet in the indus- 
trial environment has raised widespread public concern about the security of 
proprietary ICS protocols [8]. Historically, because the industrial engineering 
stations were trusted in closed networks, these domain-specific protocols were ini- 
tially designed with little attention to security [26,41]. However, recent research 
has shown that the improper input caused by the vulnerabilities in ICS protocols 
to ICS components has become the most common architectural weaknesses of 
ICS [22]. With the increasing number of ICS components connecting to external 
networks [40], the unawareness of this attack surface may lead to unpredictable 
security breaches. 

A common alternative to discover the vulnerabilities in ICS protocols is to 
conduct a fuzz testing. Since most ICS devices are built on closed operation 
systems, it is infeasible to use widely used grey-box fuzzing methods. Current 
fuzzing methods towards ICS protocols mostly use the protocol specification 
extracted from network packets to facilitate black-box fuzzing [12,25,28]. How- 
ever, the performance is limited by the diversity of available network packets. 
In practice, due to the privacy and complexity of ICS protocols, it is hard to 
obtain sufficient ICS network packets from open-source warehouse or real-world 
industrial environment, which limits the efficiency of black-box fuzzing [21]. 

Inspired by IOTFuzzer [18] and DIANE [37], we shift focus to industrial engi- 
neering software, which provides hardware and software configuration, program 
development and real-time process monitoring for ICS devices. Actually, the con- 
struction of the network packets sent by industrial engineering software reveals 
abundant information about ICS protocol specifications. However, traditional 
methods on binary analysis are constrained due to the following reasons: (1) 
Commercial industrial engineering software usually only supports online opera- 
tion with connection to ICS devices, which continuously brings a large number 
of network packets in a few seconds. The frequent network behaviours make 
it troublesome to analyze the network packet construction in industrial engi- 
neering software. (2) Most ICS protocols are designed in binary format without 
keywords and separators in traditional network protocols such as HTTP. Identi- 
fying the field boundaries from execution traces without these prior knowledge 
is also challenging. 

In this paper, we introduce ICS Protocol Specification Extraction (IPSpex) 
method to improve the efficiency of black-box fuzzing via analyzing the network 
packet construction in industrial engineering software. To overcome aforemen- 
tioned difficulties, we use field semantics to locate the target message during 
program execution, which are extracted from network traffic in advance. And 
then we apply a novel mechanism specified for industrial engineering software 
to capture the execution traces. Finally, we combine backward data flow track- 
ing with sequence alignment algorithms to extract message format from exe- 
cution traces. For evaluation, we compare our results with Wireshark. In gen- 
eral, [PSpex achieves high correctness and perfection on three widely used ICS 
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protocols, including Modbus/TCP, S7Comm and FINS. To show how IPSpex 
can benefit black-box fuzzing, we use IPSpex to extract the protocol specifica- 
tion of a real-world undocumented ICS protocol, UMAS, and then we build a 
fuzzer based on boofuzz [4]. Totally we find five 1-day vulnerabilities that have 
been published on Talos [6], and two 0-day vulnerabilities. 


Contributions. In summary, we make the following main contributions. 


— We develop IPSpex, a protocol specification extraction method for ICS pro- 
tocols, which automatically extract the field semantics and message format 
to facilitate fuzzing. 

— We evaluate IPSpex on three widely used ICS protocols based on three open- 
source software libraries. The results shows that it achieves high correctness 
and perfection compared to Wireshark. 

— We use IPSpex to find the vulnerabilities of a real-world undocumented ICS 
protocol, UMAS. Totally we find five 1-day vulnerabilities and two 0-day 
vulnerabilities. 


2 Background 


2.1 ICS Protocols 


ICS protocols are widely used by industrial engineering stations, field devices 
and systems such as Programmable Logic Controller (PLC), Remote Terminal 
Unit (RTU), Distributed Control System (DCS) and Industrial Communica- 
tion Device (ICD) [39], as shown in Fig. 1. As Industrial Ethernet continues to 
expand, it has recently surpassed traditional field bus architectures to become 
the leading connection methodology in plants around the world [1]. Industrial 
Ethernet mainly uses domain-specific ICS protocols such as Modbus/TCP and 
S7Comm, encapsulated within the Ethernet protocol. However, due to the con- 
vergence between industrial Ethernet and traditional Ethernet, this convenience 
also creates new external threat vectors such as eavesdropping, denial of service 
and unauthorized device control. 


Field Bus 


! 
i 
Operation CH History Engineering | 


Station 


Station Database |—— Wireless Network 1 
ia 


ICS Devices ICS Devices ICS Devices 


Fig. 1. Typical Application Scenarios of ICS Protocols 
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ICS devices, such as PLC and RTU, commonly use proprietary ICS protocols 
to communication with ICS engineering software and other ICS devices. A com- 
mon alternative to discover the vulnerabilities in ICS protocols is to conduct a 
penetration test towards ICS devices. Fuzzing has approved to be very effective in 
finding vulnerabilities on real-world software [31]. Specifically, grey-box fuzzing 
leverages the coverage information collected from the target software to achieve 
higher performance when source code is not available [45]. However, a majority 
of ICS devices run on closed real-time operation systems (e.g., VxWorks [10], 
ADONIS [7]) or customized runtime system (e.g., CODESYS [5]). As current 
embedded firmware emulation techniques are not competent for these complex 
operating systems [19,24], it is infeasible to employ grey-box fuzzing tools based 
on dynamic binary instrumentation. 


2.2 ICS Protocol Reverse Engineering 


Protocol reverse engineering is an effective method to extract protocol specifi- 
cation. Due to the challenges to employ binary instrumentation on the firmware 
of ICS devices, most work use network-based protocol reverse engineering meth- 
ods on ICS protocols. IPART applies an extended voting expert algorithm to 
infer the boundaries of industrial protocol fields. It then classifies messages into 
sub-clusters for protocol message format inference [42]. A full process has been 
proposed for ICS protocol analysis including fixed position field inference, assem- 
bling of fragmented packets, variable-length field inference and client-server rela- 
tionship field inference [17]. The structure of a read-world private ICS protocol 
used by Schneider Modicon M580 is analyzed in [38], including message extrac- 
tion and clustering, field extraction and message format inference. 

However, these methods only focused on static network traffic analysis, which 
cannot achieve high performance on undocumented ICS protocols because they 
are severely limited by the diversity of available network packets For instance, a 
message usually consists of several static and dynamic fields indicating different 
physical meanings. The scarcity in the diversity of network packets can mislead 
these methods to regard the dynamic fields as static fields, which will cause 
wrong classification on field types. Therefore, there is a strong possibility that 
the protocol messages in network traffic only cover a small set of entire definitions 
and cannot provide a full-scale description on ICS protocol specifications. 


2.3 ICS Protocol Fuzzing 


Recently, some grey-box fuzzing methods towards ICS protocols have been pro- 
posed to guide the generation of new inputs. Peach* collects the coverage infor- 
mation during the testing procedure, saves those valuable packets that trigger 
new path coverage and breaks them into pieces, which are used to construct 
higher-quality new packets for further testing [30]. Polar combines static analysis 
and dynamic taint analysis on software libraries to identify the function code and 
related information, which can guide the generation of test cases [29]. GANFuzz 
uses generative adversarial network to learn protocol grammar and produces 
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fake but plausible messages for fuzzing [25]. PropFuzz uses Ratcliff/Obershelp 
pattern recognition algorithm to analyse the valid process of connection estab- 
lishment and then send protocol-specific commands [34]. 

However, because these efforts only focus on open source software libraries 
such as libmodbus, lib60870 and libiec61850, they fail to provide the impact of 
the vulnerabilities in these libraries on ICS devices. In fact, these vulnerabilities 
found by these methods are not sufficient to reflect the weakness of real-world 
ICS devices, which limits the further application in practice. 


3 Problem Statement 


To efficiently facilitate black-box fuzzing towards ICS devices, both message for- 
mat and field semantics should be carefully defined in fuzzing template. The 
message format is the foundation for protocol reverse engineering, which defines 
the boundary between different fields. Our goal is to design a method to extract 
the message format from the network packet construction in industrial engi- 
neering software. Previous research [27] also focuses on message format, which 
recovers the message format by monitoring the message parsing process in open 
source software libraries. In practice, the parsing process of these messages are 
implemented in a much complex way in ICS devices so that the results are not 
competent to facilitate fuzzing towards these devices. 

As for field semantics in ICS protocols, we focus on length field and function 
code field. The length field must be correctly calculated so that the followed 
bytes are valid. And it is necessary to infer the function code field because ICS 
devices will not process the data following an undefined function code. Therefore, 
it is extensively used in ICS protocol fuzzing [29,30]. In addition, we infer the 
field type from execution traces, namely, whether a field is constant, which can 
decrease the unnecessary mutation in black-box fuzzing. 


4 Methodology 


4.1 Overview 


For most industrial software, operation commands are only accepted by indus- 
trial devices when they are converted to online mode, which continuously brings 
a large number of network packets in a few seconds. It is hard to locate the 
specific message in network traffic for an undocumented ICS protocol because 
the field semantics are unknown. Moreover, the frequent generation of network 
packets makes it troublesome to capture the execution trace that reflects target 
packet construction because the memory layouts of these network packets are 
overlapped. 

To address these problems, we propose to use a buffer to store the execution 
traces. And only if the target message is detected by monitoring send function, 
we output the execution traces and exit the software. If not, the buffer will be 
cleared. The target message can be identified by the field semantics extracted 
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from network traffic. After obtaining execution traces, we use a backward data 
flow tracking algorithm to extract the memory trace for each byte in a message. 
And then we leverage a sequence alignment algorithm to extract message format 
from these memory traces. Field types are inferred from memory traces. 

Figure 2 depicts the architecture of IPSpex that comes with three main mod- 
ules: @ field semantic extractor, @ context-aware execution monitor and ® pro- 
tocol specification analyzer. 


= a 
Engineering Traces 
Station | om. 
@ Field Message Field 
—_ M | —~ | Semantics Format Type 
= Messages 
=H 


Industrial 
Device 


Fig. 2. IPSpex: An Architectural Overview 


4.2 Field Semantic Extractor 


The field semantic extractor aims to automatically identify the length and func- 
tion code field from network traffic, which are used to locate specific message in 
the next step. 


Length Field. The length field in most ICS protocols is variable, which indicates 
part or all of the length of a single message. Generally, we divide the fields 
in one message into fixed-length fields and variable-length fields, and only the 
latter contribute to the length of message. Since the changes of length field and 
the length of the message are both caused by variable-length fields, we could 
eliminate these changes using the difference between the value of length field 
and the length of the message. The results are the total length of fixed-length 
field, which is a statistical constant. Based on this observation, we combine n- 
gram analysis and statistical frequency analysis to identify the offset and length 
of length field (n is generally 2 or 4). If the frequency of the constant exceeds 
a given threshold (e.g. 0.95), we could demonstrate that it is the length field. 
In addition, there should be some messages whose length is greater than Oxff to 
determine the length of the length field. 


Function Code Field. The function code of ICS protocols represents the 
domain-specific function of a single message. Since the values of function code 
field are always different between different functions, the number of functions 
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in network traffic should be equal to the number of distinct values in function 
code field. Based on this insight, we manually generate several different func- 
tion messages (at least 2 for each function) using industrial software, filter the 
heartbeat packets, and label the messages of these functions. And then for the 
messages of each function, we conduct byte frequency statistics and reserve the 
invariant fields for this function among these messages. Finally, we collect all 
the invariant fields, count the number of distinct values of them and regard the 
field whose number equals the number of functions as the function code field. 

After finding the offset and length of both length and function code field to 
identify the target message in network traffic, we can determine the execution 
trace to capture. 


4.3 Context-Aware Execution Monitor 


The goal of context-aware execution monitor is to capture the execution traces 
for further analysis, which also provides an instruction classification and records 
the address of real-time memory access. 

The context-aware execution monitor uses instruction-level instrumentation 
to record execution traces. Once the engineering software is started, the instruc- 
tion buffer is initialized. Particularly, only when the connection to ICS device 
is established, the context-aware execution monitor starts to fill the instruction 
buffer with a tuple. We can intercept the network functions such as send, and 
decide if the message in socket buffer has the same field feature as the target 
message including the length and function code field. If not, the instruction 
buffer are cleared. After identifying the target message in network traffic, we 
can output the execution traces in the buffers and exit the software. 

To conduct precise data dependency analysis later, each tuple in the instruc- 
tion buffer consists of four elements: type, instruction, operands and mem- 
ory address (e.g. {R/movjesi, dword ptr [rbp + 8]|0x163f200}). For further 
memory trace construction, we divide the instructions into three categories: 
memory read (R), memory write (W) and others (X). For memory read and 
write instructions, we record the real-time address of memory access. At the 
same time, the address and size of socket buffer are recorded. 


4.4 Protocol Specification Analyzer 


The protocol specification analyzer aims to find the message format and field 
type using backward data flow tracking and sequence alignment algorithms. At 
the first step, we construct memory traces through backward data steam tracing 
for each byte in a single message, based on which we extract message format 
using sequence alignment algorithm. In addition, the function and field types 
are also inferred from execution traces. 

After obtaining the address and length of socket buffer, we build a memory 
trace for each byte in this address space using a backward data flow tracking 
algorithm, which mainly processed different types of instructions to recognize 
corresponding memory region. The details are presented in Algorithm 1. IPSpex 
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Algorithm 1. Memory Trace Construction 
Input: The address and length of socket buffer, instructions 
Output: [mti, mte,...,mtn]: one memory trace mapping one byte 
1: insList — INVERT(instructions) 
2: MemTraceList — 0 
3: for each of fset € [0, Buf ferLength| do 
:  MemTrace — 0 


4 
5: BufAddr — Buf fer Address + offset 
6: DepChain — GenDepChain( Buf Addr) 
7: StackTag — 0 

8: for each ins € insList do 

9: if ISMEMACCESsS(ins) then 


10: addr — INSPARSE(ins) 

11: type — GETTYPE(ins) 

12: DataLoc — GENPLOC(DepChain, addr, type, StackT ag) 
13: else 

14: DataLoc — GENDATALOC(ins, DepChain) 

15: stack — UPDATESTACK(ins, StackT ag) 

16: end if 

17: DepChain — DepChain U DataLoc 

18: MemT race — ADDDATALOC(MemTrace, DataLoc) 
19: end for 

20: MemTraceList — MemTraceList U MemTrace 

21: end for 


22: return MemTraceList 


supports a subset of x86 instructions that are divided into six categories includ- 
ing derived move instructions (e.g., mov, and movzx), stack operation instruc- 
tions (e.g., push, pop and ret), explicit calculation instructions (e.g., add, xor, 
shl and inc), implicit calculation instructions (e.g., imul and idiv) and data 
exchange instructions (xchg) and byte swap instructions (bswap). These instruc- 
tions supported by IPSpex are enough to extract message format and stop the 
construction of memory trace. 

It is worth noting that the same memory address may store values of different 
fields. For example, the stacks can be reused using push and pop instructions. If 
there are different fields on the stack, their memory addresses will be overlapped, 
which may cause wrong results in message format inference later. Therefore, 
we use a stack tag (a sequence number) to record these operations to identify 
different fields. 

Figure3 shows an example of memory trace for write area function in 
S7Comm protocol, which illustrates the address sequence of the memory access 
for each byte in a message. According to the memory traces the corresponding 
field semantics are also presented. 


Message Format. The message format of a single message can be extracted 
using a sequence alignment algorithm, which determines whether the memory 
traces of two adjacent bytes are spatially continuous. We present some strategies 
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Socket buffer 


0x10 | —| _0x12c60234 Syntax ID 

Ox163flac > 0x12c61284 + 0x12c60235 Transport size 
0x163f11c |——>| 0x12c60236 
0x163f11d —_--—+|_0x12c60237 
0x163f190 >| 0x163f184 + 0x163f15c +| 0x12c60238 
0x163f191 [>| _0x163f185 =-——»|_0x163f15d__[-—}_0x12c60239 
0x163f19c >| 0x12c61278 +|_0x12c6023a Area 
0x163f1a4 +| 12c61280 +| 0x163f13c * _0x163f14c >| 0x12c6023b 
0x163f1a5 + 12c61281 >| 0x163f13d >| 0x163f14d + 0x12c6023c Address 
0x163f1a6 + 1261282 + 0x163f13e + _0x163f14e + 0x12c6023d 


Length 


DB number 


Fig. 3. Memory Trace of Write Area Function in S7Comm Protocol 


to determine this spatial continuity. Firstly, different types of program locations 
in memory trace are divided into different fields. Secondly, if the two locations 
are both immediate locations and the values of them are equal, we merge them 
into the same field. Thirdly, if these two program locations are both memory 
locations, we check if they are in continuous memory region and if their stack 
tags are the same, so as to decide whether to merge them. These strategies are 
implemented in Algorithm 2. 


Algorithm 2. Message Format Extraction 


Input: [mti, mte,...,mtn]: one memory trace mapping one byte 
Output: [of fseti, of fsete,...,of fsetn]: field offsets for each byte 
1: MemTraceLen — GETSIZE(MemT race List) 

2: MsgFormat — INITIALIZE(MemTraceLen) 

3: for each i € [1, MemTraceLen — 1] do 

4: MemTracel — MemTracei 

5: MemTrace2 — MemTracei+1 

6:  MinSize — GETMINSIZE(MemTracel, MemTrace2) 

7: for each j € [0, MinSize] do 

8 type — TyPECoMP(MemT'racel;, MemT'race2;) 


9: if SEQComp(MemTracel;, MemTrace2,;, type) then 
10: MsgFormat — MERGE(MsgFormat, i) 

11: end if 

12: end for 

13: end for 


14: return MsgFormat 


Field Type. We can generally classify the fields to constant or non-constant 
according to their values, or whether the addresses of them fall in the data 
section of the corresponding module, which are recorded when the programs are 
loaded. The immediate values in memory traces indicate that the fields in specific 
functions are constant, which may be checked on the target devices. Therefore, 
in order to send valid commands these fields can not be changed, which are 
especially important for test case generation in fuzzing. 
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5 Evaluation 


5.1 Experimental Setup 


We implement IPSpex using pin-3.16 and python 3.7.4. Respectively, we use 
three public software libraries including libmodbus v3.1.6 [35], snap7 v1.4.2 [33] 
and libfins [13] to collect execution traces. We select these software because 
they use typical ICS protocols to communicate with industrial devices. And we 
compare IPSpex with Wireshark like previous work [15,16,20]. 


5.2 Overview 


Although Wireshark supports a variety of protocol parsers for ICS protocols, 
sometimes it is not reliable. Therefore, to completely evaluate the effectiveness 
of IPSpex, we compare our results with the latest Wireshark v3.2.7 based on the 
message format extracted from the source codes in libmodbus, snap7 and libfins. 
We use correctness and perfection derived from other indicators for a message 
to reflect their effectiveness as previous studies [44]. 


Indicators. For one network message, we use F, to denote the message field 
set defined in the source codes. And we use F; and Fy for those identified by 
IPSpex and Wireshark. Respectively, the size of them are denoted by |F'|, | F%| 
and |F,|. 

If the offset of the fields identified by IPSpex or Wireshark are the same 
with those in F, we define those fields as a correct set, which is denote by Se. 
Otherwise, we define those fields as a false set, which is denote by Sy. For the 
fields in a correct set, if the length identified by IPSpex is also identical with the 
length of those in F,, we group these fields into a perfect set and denote them 
as Sp. Otherwise, we group them into a incorrect set and denote them as Se. 
Respectively, the size of them are denoted by |Sp|, |Se] and |S|. Generally, for 
one message we have (x equals i or w): 


|F| = [Spl + [Se] + [Sel (1) 


The overall performance of IPSpex on one protocol derived from above indi- 
cators are defined as follows, where N is the number of tested functions. Cor- 
respondingly, we also use these indicators to measure the performance of Wire- 
shark, where N is the number of the functions to evaluate for one protocol, and 
j is the index of the function. 


— Correctness. We define the correctness as the ratio from the number of 
the correct fields identified by IPSpex or Wireshark to the number of fields 
identified by them (x equals i or w): 


N 

1 ym [Spl + [Seq 

R= PJ ej (2) 
n> |Fz| 
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— Perfection. We define the perfection as the ratio from the number of the 
identical fields identified by [PSpex or Wireshark to the number of fields 
defined in source codes: 


N 
1 DA 
Ranco a (3) 
NZ |Fsjl 


Summary. We have evaluated 11, 15 and 20 different functions for Mod- 
bus/TCP, S7Comm and FINS, which has almost covered all of the functions 
supported by them. Table 1 shows the detailed results for each protocol. As for 
Modbus/TCP and S7Comm, Wireshark has a better performance than IPSpex 
because these protocols are used more widely and their formats are elaborately 
defined in Wireshark. However, the FINS protocol, only used by OMRON con- 
trollers, is less known to the public, especially for some seldom used functions, 
which are not well parsed by the latest Wireshark but IPSpex works. There- 
fore, IPSpex can provide an alternative to obtain the protocol specifications of 
undocumented ICS protocols. The deep analysis of eval result is performed as 
follows. 


Table 1. Statistical Results of Indicators 


Protocol | Func. ||F;| Wireshark IPSpex 

|Sp| | |Se| | |S¢|)| Corr. | Perf. ||Sp|||Se]||Sp|| Corr. | Perf. 
Modbus/TCP}| 11 |8.00 8.00 0.00)0.00 1.00 | 1.00 | 7.63| 0.18) 0.00| 1.00 | 0.96 

S7Comm 15 | 20.0) 20.0 0.00) 0.00 1.00 | 1.00 | 17.6 | 1.53) 1.46 | 0.94 | 0.88 
FINS 20 |14.1 | 13.25 | 0.35 |0.00/ 1.00 0.94 | 14.0) 0.01 | 0.01 | 0.99 | 0.99 


5.3 Modbus/TCP 


We evaluate 11 functions for Modbus/TCP supported by libmodbus v3.1.6. 
IPSpex tests all the functions of libmodbus, as shown in Fig. 4, with high cor- 
rectness of 100% and perfection of 96%. After inspecting the memory traces, we 
find the fields identified in Se are caused by the semantically fields segmentation 
and correlation. 

Semantically fields segmentation refers to the situation that one field is parsed 
based on another field in one message. For example, the fields in Se in “write 
registers” function is the list of the values of the registers to write. After checking 
the memory trace of the write register function, we find that these values are 
in a continuous buffer, which causes IPSpex to aggregate them into one field. 
However, the list of values are separately defined in specific number of fields 
based on the “Word Count” field. The implicit semantic-aware way for field 
segmentation is hard for IPSpex to recognize. 

Semantically fields correlation means that some fields represent related phys- 
ical meanings such as the message length. For instance, we find the fields in Se 
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Fig. 4. Indicators for Functions Defined in Libmodbus 


in “read and write registers” function are the length to write, which are iden- 
tified as two-bytes “Write Word Count” and one-byte “Byte Count”. Figure 5 
shows the instruction trace for calculating the value of these fields. The register 
“edx” is the source of the value of length, and “dl” stores the low byte while 
“cl” stores the high byte. However, “al” is equal with “dl” and is doubled before 
being written to the “Byte Count” field. These two fields are merged because 
they come from the same memory location. The binary-based methods can not 
recognize this semantics-aware difference, which leads to the unexpected result. 


mov edx, dword ptr [ebp + 0x10] 

mov ecx, edx 

sar ecx,8 

mov al, dl 

mov byte ptr [ebp + esi - 0x106], cl 

add al, al } Write Word Count 
mov byte ptr [ebp + esi - 0x105], dl 

xor ecx, ecx 

mov byte ptr [ebp + esi - 0x104], al } Byte Count 


Fig. 5. Assemble Code of Read and Write Registers Function 


5.4 S7Comm 


We evaluate 15 functions of S7Comm supported by snap7 v1.4.2. For S7Comm 
protocol, Wireshark defines 12 main functions and there are many sub-functions 
of userdata under “CPU services” function. IPSpex has tested all of the main 
functions supported in snap7 and some sub-functions under “CPU services” 
function, as shown in Fig. 6, with high correctness of 94% and perfection of 88%. 
The main reasons for IPSpex to improperly identify the fields include irregular 
function code and strings. 
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Fig. 6. Indicators for Functions Defined in Snap7 


The function code of “CPU services” is irregular that its length is 3 bytes 
while others are 1 bytes. From the memory trace we find these three bytes are 
separately assigned with different constant values separately, which leads IPSpex 
to tell them apart. 

The other common fields in Sy and Se come from the printable strings in 
messages such as the name and number of block, the length of program and 
the command of program invocation services. The memory trace shows that 
the program sets part of a string to constant values while others as variable 
values, which leads IPSpex to set different types of program locations for them 
and divide them into different fields. Nevertheless, printable strings are easy to 
identify with explicable meanings and correct manually. 


5.5 FINS 


Libfins defines more than 60 functions and some are similar. We manually select 
some of them to analyze those with abundant information in these fields. For 
instance, we drop functions such as “finslib_link_unit_reset” that have a body of 
zero length. We also drop some functions that have the same parameters such as 
“finslib_access_log_read”. In addition, libfins sends FINS/TCP header and FINS 
body separately, because the former header has a fixed format. Totally we have 
selected 20 functions for evaluation, as shown in Fig. 7, with high correctness of 
99% and perfection of 99%. 

The only two functions that are not well parsed by IPSpex include “read file” 
and “write file’. After checking the source code of libfins, we discover that the 
filename parameter is encoded as an expanded 8.3 filename format with spaces 
padded where necessary. Indeed, from the memory trace we discover that the 
filename is divided two part. The first part is 8-bytes variable bytes while the 
second part is 4-bytes spaces. Therefore, IPSpex regard it as two fields, which 
leads to the wrong results. 
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Fig. 7. Indicators for Functions Defined in Libfins 
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In addition, we discover that some functions are not completely dissected 
by the latest Wireshark. On one hand, the function code of “read access log” 
0x2140 is not defined in the omron-fins dissector in Wireshark, which is unable to 
identify the subsequent bytes after function code. However, IPSpex can identify 
the parameter fields in access log reading message such as the start address and 
the number of bytes to read. On the other hand, although most of the function 
codes in FINS are well defined by Wireshark, the parameters of some functions 
are partially absent. IPSpex has recognized these parameter fields, which are 
consistent with the field semantic information provided by the source code of 
libfins. Table 2 lists these functions and the semantics of their parameters that 


Wireshark did not recognize. 


Table 2. The Fields Semantics for FINS 


Function Description 


read message 


Parameters 


message mask 


read file name 


disk identifier, 
path, length of path, 

number of files, 
start number of file 


disk identifier, file name 
path, length of path, 


mead ule start position of file, 
number of bytes 
write file path, length of path, 


data to write 


delete files 


path, length of path 


force bit 


area, main-address, sub-address 
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6 Application 


To measure the effectiveness and efficiency of IPSpex in black-box fuzzing, we 
apply IPSpex to the vulnerability discovery on a real-world undocumented ICS 
protocol, UMAS, a management protocol used by Schneider Electric Modicon 
PLCs or PACs. 


6.1 Target 


We apply IPSpex to test Schneider Electric Modicon M580 (BMEP584020) pro- 
grammable automation controller (PAC), firmware version SV2.70. It uses the 
reserved function code 0x5a in the common Modbus/TCP protocol specification 
as shown in Fig.8. The engineering software we use to communicate with the 
PAC is Unity Pro, which enables various actions such as obtaining the CPU 
status and reading the program running on device. 


Modbus/TCP Header Modbus 


Transaction Protocol Le Unit UMAS UMAS 
ength zs 
Identifier Identifier(0x5a) Session ID 


UMAS Data 


Identifier Identifier 


Fig. 8. UMAS Message Encapsulated in Modbus/TCP 


6.2 Procedure 


According to our protocol specification extraction method, firstly, we use Unity 
Pro to send specific command and capture the UMAS network traffic using 
Wireshark. The length and function code field can be identified by comparing the 
distribution of each byte using the aforementioned method. Secondly, we capture 
the execution traces that record the target massage construction and extract the 
message format. Note that the target message is saved for next stage. Finally, 
we use boofuzz, an open-source network protocol fuzzing framework, to generate 
test cases for fuzzing UMAS protocol. The UMAS message is encapsulated on 
the Modbus/TCP message. And we use boofuzz-byte (i.e., we treat each byte as 
a single field and perform mutations) as comparison. The default “fullrange” 
option is set to “False” for all dynamic fields. 

Although it is efficient to choose an algorithm to select seeds from many 
UMAS target messages for fuzzing [36], it is hard to trigger Unity Pro to generate 
a large amount of UMAS network packets in great diversity because some fields 
in these messages are also dependent in the hardware device (e.g., the default 
memory address for input). Therefore, we use only one target message as the 
seed of fuzzing for each function code. And the constant values in the results of 
protocol specification are set as constants in message template. 
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Table 3. Description of Vulnerable Functions 


Name Code | Length | Fields CVE ID IPSpex | Byte 
GetRawApplilnfo | 0x20 13 7 CVE-2018-7843 4s 7s 
StartRestoreData 0x21 14 8 CVE-2018-7856 | Imin2s | None 

StartSaveData | 0x24 13 8 CVE-2019-6828 | Imin8s | None 
StartRestoreData | 0x25 18 10 CVE-2018-7857 | lmin15s | None 
WriteSync 0x50 38 26 Confirmed 10s None 
Unknown 0x50 27 19 None 8s None 
SetBreakPoint | 0x60 20 11 | CVE-2018-7855 6s 8s 

6.3 Result 


Totally, we have found five 1-day vulnerabilities and two 0-day vulnerabilities, 
including six ones that lead M580 PAC to enter a non-recoverable fault state, 
and one that can cause the I/O module into error. However, after searching for 
related information on public vulnerability report [6], we find similar vulnerabil- 
ity descriptions on five functions of them, which only differs in product name. 
The detailed descriptions of vulnerable functions are shown in Table 3. The func- 
tion names are identified from call-stack traces, while the CVE ID refers to the 
similar description with corresponding functions. The length and number of fields 
are the attributes of the test cases that trigger the vulnerabilities. The ‘None’ 
in Byte column means testcases are exhausted without finding a vulnerability. 

Although existing smart black-box fuzzers support some mutation strategies 
to improve efficiency, the vulnerabilities discovered are largely limited without 
message format. The root cause is that it is extremely hard to locate the offset 
and length of the vulnerable field in advance. In our experiment, the CVE-2018- 
7843, CVE-2018-7855 and the two 0-day vulnerabilities we have found are all 
caused by one vulnerable field. But the length of the vulnerable field for the 
latter two is 4, rather than 1 for the former two, which makes boofuzz-byte 
fuzzer unable to find these two vulnerabilities. According to the public report, 
CVE-2018-7843 is caused because the PAC does not check the offset parameter 
when receiving a block reading command. CVE-2018-7855 is caused because the 
PLC does not verify the invalid program address. 

Particularly, some vulnerabilities can only be triggered when more than one 
fields equal specific values. According to the message format, we sample 4-byte 
length fields to some boundary values in advance and test other fields. CVE- 
2018-7856, CVE-2018-7857, CVE-2019-6828 all belong to this type. CVE-2018- 
7856 is caused when the PAC does check the address parameter with specific 
object type. Similarly, the PAC does check the offset parameter with specific 
block number, which result in CVE-2018-7857. CVE-2019-6828 is caused when 
the PAC does not check one undocumented field with specific type of position 
variable. 
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7 Discussion 


In this section, we discuss the limitations in our method and present several 
improvements for future work. 


Stability. As IPSpex uses Pintool that only supports the executable written in 
C or C++ to capture execution traces, it is can not be applied to the engineering 
software written in other languages such as .NET. Another limitation lies in 
the complexity of engineering software, which may use multiple process to deal 
with network data transmission. Under this circumstance, the construction and 
sending (or receiving) of network packet are processed in different modules. We 
consider to use dynamic binary instrumentation to recognize the core component 
of communication, and combine the results with under-constrained symbolic 
execution to drive the execution of the component. After obtaining the execution 
traces, we can use IPSpex to extract the protocol specification. 

Another issue on ICS protocols is encryption. To the best of our knowl- 
edge, ICS protocols use various authentication mechanisms such as session id 
in Siemens S7CommPlus P2 Version [14], SHA 1 algorithm and RSA in Rock- 
well CIP [23] and customized algorithm in Mitsubishi MELSEC protocol [32]. 
But data encryption is rarely adapted due to the inevitable overhead for data 
encryption and decryption. And we can identify these authentication process 
through open source software library [23] or the statistics of arithmetic and bit- 
wise instructions [43]. So that we can pass the authentication before fuzzing 
process. 


Robustness. IPSpex uses backward data flow tracking and sequence align- 
ment algorithms to extract the message format from instruction traces. In fact, 
there are usually slightly differences in instruction traces with different compiler 
options. These may lead to incorrect results on the identification of strings and 
continuous constant values. The influence in our experiment is relatively small. 
Nevertheless, we think this problem needs a comprehensive explanation, and we 
will further explore it in the future. 


8 Conclusion 


In this paper, we present IPSpex method to improve the efficiency of black- 
box fuzzing. From the execution traces such as instruction trace and call-stack, 
IPSpex can extract the message format sent by engineering software using a 
backward data flow tracking and sequence alignment algorithm. And we combine 
IPSpex with boofuzz to improve the efficiency of fuzzing. We have evaluated 
IPSpex on Modbus/TCP, S7Comm and FINS, which all achieve high correctness, 
conciseness and coverage compared to Wireshark. For vulnerability discovery on 
undocumented UMAS protocol, totally we find five 1-day vulnerabilities and 
two 0-day vulnerabilities in minutes. Our future work focuses on two aspects 
including the improvement of IPSpex for higher stability and robustness, and 
other application of IPSpex. 
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Abstract. Probe requests help mobile devices discover active Wi-Fi net- 
works. They often contain a multitude of data that can be used to identify 
and track devices and thereby their users. The past years have been a 
cat-and-mouse game of improving fingerprinting and introducing coun- 
termeasures against fingerprinting. 

This paper analyses the content of probe requests sent by mobile 
devices and operating systems in a field experiment. In it, we discover 
that users (probably by accident) input a wealth of data into the SSID 
field and find passwords, e-mail addresses, names and holiday locations. 
With these findings we underline that probe requests should be consid- 
ered sensitive data and be well protected. To preserve user privacy, we 
suggest and evaluate a privacy-friendly hash-based construction of probe 
requests and improved user controls. 


Keywords: Probe Requests - Wi-Fi Tracking - Privacy Preserving 
Technologies 


1 Introduction 


To establish a Wi-Fi connection, mobile devices can transmit so-called probe 
requests to receive information about nearby Wi-Fi networks. An access point 
observing a probe request is led to reply with a probe response, thereby ini- 
tiating a connection between both devices. While probe requests are used to 
establish a connection between a mobile device and an AP, they also serve as 
a means to track, trilaterate and identify devices for attackers who passively 
sniff network traffic. They can contain identifying information about the device 
owner depending on the age of the device and its OS. One of those is the pre- 
ferred network list (PNL), which contains networks identified by their so called 
Service Set Identifier (SSIDs). Around 23% of the probe requests contain SSIDs 
of networks the devices were connected to in the past, according to our mea- 
surements. There exist online mapping services like WiGLE!, which provide 


1 https: //www.wigle.net. 
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information about geographical locations where SSIDs have been observed. A 
casual observation of the networks available in any given residential area returns 
a multitude of personalised, often descriptive SSIDs used for private networks. 
Therefore, a query for an SSID might reveal home or work addresses, or other 
visited locations where users connected to Wi-Fi, and can thereby reveal very 
personal information about them. 

Another application in which probe requests are frequently used is tracking 
of devices in stores or cities: as probe requests are sent rather frequently, they 
can be used to trilaterate the location of a device with an accuracy of up to 
1.5m [24]. Trilateration can also be used to follow the movements of a device 
and thereby its user over a longer period of time, and track them through a 
store or city [23]. This is in fact employed in 23% of the stores already [1]. Com- 
panies and cities that conduct Wi-Fi tracking take the legal position that only 
the MAC address contained in probe request is considered personal data accord- 
ing to GDPR Article 4(1) [8,10], which protects personal data from unlawful 
collection and processing. They therefore maintain that if the MAC address is 
anonymised before storage, the collection and evaluation of probe requests is 
GDPR compliant [29]. The randomisation of MAC addresses mitigates linkabil- 
ity via this element. Instead, we focus on looking at what privacy risks originate 
from probe requests related to the list of SSIDs stored in the PNL. We provide 
empirical evidence that probe requests should also be considered personal data 
on the basis of their SSID field, which we find can even contain directly iden- 
tifying information. We hope to thereby stress the need for a more thorough 
legal evaluation. We additionally propose changes to the handling of SSID field 
and mobile OS behaviour to enhance the privacy of users and decrease their 
trackability to passive sniffers. 

To this end, we contribute the following: 


— We conduct a field experiment in a German city, recording probe requests of 
passersby. 

— We evaluate their content, with special regard for SSIDs and identifying infor- 
mation. 

— We summarise the state of probe requests for different OS versions. 

— We propose a hashing of non-wildcard SSIDs in probe request to protect their 
confidentiality against passive observers. 

— We propose changes to the UI design of Wi-Fi selection and PNL manage- 
ment. 


This paper is structured as follows: in the next section, we first provide a 
background on network discovery and privacy implications of MAC addresses. 
Here, we also compare the privacy features of various Android and iOS ver- 
sions. Thereafter, we present related work in Section 3. Section 4 explains the 
experimental setup and our handling of ethical and privacy concerns. We then 
present the results of our data analysis in Section 5. Section 6 proposes mitigation 
approaches on both protocol and user interface level. In Section 7, we discuss the 
findings. Finally, Section 8 concludes the paper. 
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2 Background 


In this section we define the underlying technological background of our work. 


2.1 Network Discovery in 802.11 


To establish a Wi-Fi connection between a mobile device and an access 
point (AP), both devices have to discover each other; either via active or passive 
discovery: 

In a passive discovery, an AP advertises itself by sending out beacons con- 
taining its SSID, MAC address, the cipher suites it supports and a few other 
elements [17]. These beacons are sent at an interval of approximately every 
100 ms [12], and mobile devices can respond with Wi-Fi association frames. 

In an active discovery, mobile devices broadcast probe requests to find APs 
they have previously associated with. Active discovery is also required to connect 
to so-called hidden networks, for which the AP does not advertise the network, 
i.e., does not send out beacons. Probe requests sent by most modern devices 
are typically broadcast and contain the empty wildcard in the SSID field. APs 
receiving a probe request respond with a probe response directed at the sender 
of the probe request. The probe response contains the SSID of the AP and 
additional information like supported rates and various capabilities. 

The reason both active and passive discovery mechanisms are used is that 
while APs advertise themselves constantly, scanning for beacons can be rather 
energy consuming and slow. Additionally, a mobile device scanning for beacons 
on one channel with a certain frequency might miss beacons sent on another 
channel. A device actively probing for APs just has to turn on the Wi-Fi radio 
until it receives the probe response, which typically takes only a few millisec- 
onds [12]. On the other hand, active discovery requires the transmission of pack- 
ets containing information about the mobile device. While probe requests sent 
by devices running older OS might contain SSIDs of one or more APs the device 
has previously been connected to, newer devices transmit only the SSIDs of 
hidden networks to improve user privacy and make the device less traceable (cf. 
Section 2.3). Additionally, they omit the real MAC address of the device, instead 
sending a randomised MAC address. 

Probe Requests are sent in bursts, every burst containing several probe 
requests sent via some or all of the 14 channels of the 2.4GHz spectrum (and 
additionally the 5 GHz spectrum if applicable) within a short time span of just 
a few milliseconds. Whether MAC address randomisation is employed or not, all 
packets in a burst are sent from the same MAC address. 


2.2 Privacy Implications 


A MAC address consists of 6 bytes typically represented in hexadecimal notation, 
separated by colons, e.g. 01:23:45:ab:cd:ef. The first three bytes are called 
the Organizationally Unique Identifier (OUI) and are typically assigned to the 
manufacturer of the devices. The last three bytes identify the Network Interface 
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wlan.fc.type_subtype==0x0004 


No. Time Source Info 

2284 874.747357275 46:8e:45:03:1c:02 Probe Request, SN=1954, SSID=WootWootKarneval 
2285 874.781384907 46:8e:45:03:1c:02 Probe Request, SN=1955, SSID=Wildcard (Broadcast) 
2286 874.782041635 46:8e:45:03:1c:02 Probe Request, SN=1956, SSID=TestingThisCrazyIdea 
46:8e: 


. 9898! 115:5a:f5:1f: Request, SN=404, SSID=Wildcard (Bi 
. 031938926 :15:5a:f5:1f: Request, SN=408, SSID=Wildcard (Broadcast) 
. 032516694 215:Sa:f5:1f: Request, =409, SSID=TestingThisCrazyIdea 
. 033237567 115:5a:f5:1f: Request, , SSID=alalalalalong 
-038153435 3e:15:5a:f5:1f: Request, SN= SSID=WootWootKarneval 
. 072168931 215:5a:f5:1f: Request, , SSID=Wildcard (Broadcast) 
-074146595 3e:15:5a:f5:1f: Request, , SSID=TestingThisCrazyIdea 
. 074876650 3e:15:5a:f5:1f: Request, SSID=alalalalalong 
. 079439335 i Request SSID=WootWootKarneval 
2313 887.911035684 Request, SN=1100, SSID=Wildcard (Broadcast) 
2315 887.913045924 Probe Request, SN=1101, SSID=TestingThisCrazyIdea 
2316 887.913747853 Probe Request, SN=1102, SSID=alalalalalong 
2318 887.917433850 Probe Request, SN=1103, SSID=WootWootKarneval 
2323 887 . 951525908 Probe Request, SN=1104, SSID=Wildcard (Broadcast) 


Fig. 1. Three bursts of probe requests sent from the same device. Three different SSIDs 
and the wildcard SSID, an empty string, are broadcast. Note that the starting sequence 
number (SN) in the info field is randomised per burst as well. 


Controller (NIC), produced and assigned by the manufacturer. The OUI contains 
additional information encoded in the two least significant bits (U/L and I/G) of 
the most significant byte (01 in our example): The I/G-bit is the least significant 
bit and specifies whether the recipient is unicast or a multicast. The second-least 
significant bit, the U/L bit, clarifies whether the address is locally or globally 
administered, with a globally administered address being a unique identifier for 
the physical device, while a locally administered address temporarily overwrites 
the unique global one in software [34]. Older devices use their universal address 
to broadcast probe request, which makes them easily trackable. To protect the 
privacy of users and prevent device tracking, probe requests are often sent from 
locally administered addresses, employing a technique called MAC address ran- 
domisation. Here, the MAC address commonly changes between two bursts, such 
that each burst will be sent from a new, random MAC address. This behaviour 
was first introduced in iOS 8 in 2014 [11] and in Android 8 [13] in 2017, albeit 
the first implementations suffered from information leaks: it was often possible 
to track devices despite the use of MAC address randomisation [32], for example 
by the SSIDs they contained. If SSIDs are present in a probe request, either all 
of them or a subset is contained in a burst, with every packet requesting one 
SSID. Figure 1 shows the capture of three bursts of probe requests sent from the 
same device employing MAC address randomisation but transmitting SSIDs. 

While omitting SSIDs and employing MAC address randomisation renders a 
device less trackable, other fields included in probe requests can spoil the effect: 
if the sequence number (SN) is not randomised, it is trivial to still follow a device 
over time. Therefore, a lot of devices randomise their sequence number with the 
start of every burst, as can also be observed in Fig. 1. 

The newer a device and its OS is, the more information is omitted and fields 
randomised in the probe requests. All the same, various papers still describe how 
even modern devices can be fingerprinted due to other information contained in 
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them, e.g. in the Information Elements (IE): These non-mandatory parameters 
contain information on supported rates, network capabilities, and more. Com- 
bining the IE parameters, the signal strength and, in some cases, the sequence 
number, allows to fingerprint individual devices despite MAC address randomi- 
sation. [30,32] While efforts are made to reduce the fingerprint of modern devices, 
the owners of older devices that don’t receive patches introducing MAC address 
randomisation, sequence number randomisation and SSID omission can easily 
be tracked. 


2.3 Differences Between Android and iOS Versions 


Table 1 shows the differences between iOS and Android in supporting Wi-Fi- 
related privacy features. We compare iOS versions 8, 10, 14 and 15 and Android 8 
to 12. Their combined market share comprises about 90% of the devices [26, 27], 
which makes them representative and provides a good overview over the changes 
within the last years. In the following, we elaborate on the various features 
comprised in Table 1. 


Table 1. Privacy features for probe requests in different mobile OSs. 


Apple iOS Android 
8 10 14 15 8 9 10 11 12 


Market Share in % <0.1 1.0 35.9 53.4 | 10.2 13.5 27.0 354 1.9 


Randomised MAC ... 


- while probing v v v v v v v v v 
- per connected SSID - - v v - (-)* v v v 
- after resetting settings - - v v - - (-)* (¢)* (¢)* 
New random MAC after ... - - - 6w - - - (-)t t 
Private Address by default - - v v - - v v v 
Modify distant Network - - - - v v v v v 
Manually added == hidden Automatic detection of hidden v - - - - 
Probe with SSID Only if hidden detected if man. If explicitly declared hidden 


*: Only choosable via Developer Options 

f: If use of non-persistent MAC is chosen via Developer Options, a new MAC is set (a) for every 
new connection establishment (b) every 24 hours, unless a connection is still established or (c) if 
both the DHCP lease has expired and the device has been disconnected for 4 hours 


All listed versions use MAC address randomisation while probing [13,19,34]. 
In Android 9 devices, users can choose via Developer Options whether a ran- 
domised MAC address should be used while connected. Since Android 10 and iOS 
14, private addresses are used by default: They all employ persistent random MAC 
addresses while connected. Starting with Android 11, one can choose via Devel- 
oper Options to use non-persistent randomisation per stored SSID during connec- 
tion. If non-persistent MAC addresses are used in Android, the MAC address is 
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re-randomised either (a) with every new connection establishment, (b) every 24h, 
but without disrupting the network connection to switch to a new MAC address 
or (c) if the DHCP lease has expired and the device has been disconnected for at 
least 4h. Nevertheless, up to this date, there are no Android devices that auto- 
matically receive a new random MAC address after a certain amount of time by 
default, without having to modify the Developer Options. Additionally, with per- 
sistent private MAC addresses in Android, the default behaviour is to persist a 
MAC address per SSID even after resetting network settings. This is different in 
iOS: Starting with iOS 14, removing and adding a network again causes a reset of 
the network persistent address. In iOS 15, the devices additionally receive a new 
address when not connected for more than 6 weeks [2, 14,34]. 

Allofthe listed Android versions offer to remove any saved network at any given 
time. This is not the case in iOS: Here, a network can only be removed from the 
device while in physical proximity of it or by modifying the iCloud Keychain from 
a MacBook. Without access to a MacBook or physical proximity to the network, 
it can not be removed without resetting the entire network settings [21]. 

When adding a network manually, iOS verifies whether it is a hidden net- 
work or not, whereas Android 8 (and earlier) automatically assumes that manu- 
ally added networks are hidden networks. Therefore, if a network was manually 
added, Android 8 devices send the SSID in probe requests, while newer Android 
versions only do so if the added networks were explicitly declared hidden (cf. 
Fig. 2b in Section 6.2). In iOS, the SSID is only used in probe requests if the 
network is detected to be a hidden network [3, 20]. 


3 Related Work 


In 2013, Cunche et al. showed how to link various devices by their transmit- 
ted SSIDs and inferred relationships between users [5]. This work was pub- 
lished before MAC address randomisation was deployed and free transmittal 
of SSIDs the typical means of network discovery. The authors propose the use 
of a geolocation-based service discovery instead of active discovery via probe 
requests. In 2014, MAC address randomisation was first discussed [34] and sub- 
sequently tested and published [4]. It was meant as a means to increase pri- 
vacy, but since it lacked standardisation, all implementations were vulnerable to 
attacks [32]. Since then, extensive work has been published on probe requests, 
MAC address randomisation and fingerprinting devices despite MAC address 
randomisation: 2015, Freudiger et al. gave an overview over the amount of probe 
requests sent by different devices and analysed the effectiveness of the MAC 
address randomisation employed in different devices [12]. On a positive note, 
Freudiger pointed out that recent mobile operating systems only probe for SSIDs 
of hidden networks. In another influential publication, Vanhoef et al. investigated 
how well devices can be tracked by combining various fields in probe requests [32]. 
They also present two attacks that can be used to reveal the real MAC address 
of a device and summarise that MAC address randomisation is insufficient to 
impede tracking. 
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Various other papers in the field attempt to associate probe requests from 
randomised MAC addresses: Gu et al. use deep learning methods and suggest to 
encrypt probe requests using the symmetric stream cipher ChaCha20 to protect 
them from attackers [16]. Tan et al. use minimum-cost flow optimisation to 
associate frames and reach an accuracy of more than 80% [30]. As the use of 
hidden networks and the amount of devices broadcasting SSIDs are decreasing, 
both papers put only a minor focus on the transmitted SSIDs. 

In 2019, Dagelié et al. [6] observe the occurrence of SSIDs in probe requests 
at a music festival between 2014 and 2018 and present how easy devices are 
trackable via probe requests if the devices are fingerprintable. Over the years, 
the number of MAC addresses they observe increases while the number of SSIDs 
decreases. They conclude that the use of MAC address randomisation is increas- 
ing, as is the number of probe requests that contain the empty wildcard SSID. 

An attempt at localisation of criminal groups via probe requests was pub- 
lished by Zhao et al. in 2019 [33]. They build a database of SSIDs like WiGLE 
and monitor probe requests in different locations in search for specific SSIDs. 
This methodology allows them to find and track devices belonging to a targeted 
group. 

With respect to protecting the content of probe requests, Pang et al. [22] 
published an architecture called Tryst in 2007 to conceal confidential informa- 
tion during service discovery. Tryst makes use of access control primitives using 
symmetric encryption, with which it reveals information to the correct access 
point while concealing all information not directed at it. It remains unclear how 
exactly the various SSIDs present in the SEND primitive are concealed from 
everyone except for the intended recipient. They underline the privacy risks of 
both APs transmitting SSIDs and mobile devices transmitting probe requests 
by analysing geoinformation on SSIDs collected in a 2004 data set: they find 
that about a quarter of the devices probe for SSIDs that uniquely appear in just 
one city. While Pang et al. also perform geolocalisation of SSIDs like we do, to 
the best of our knowledge, there has been no publication analysing the content 
of SSIDs of probe requests as we do in this paper and neither one proposing 
hash-based anonymisation of probe requests to this date. 

In the following, we first introduce the experiment and then strive to demon- 
strate the privacy implications of the use of such verbose devices on their users. 


4 Experimental Setup 


In this field experiment performed in November 2021, we recorded probe requests 
in a busy pedestrian zone in the centre of a German city, over the period of one 
hour, three times in total. We used six off-the-shelf antennae: three for channels 
1, 6 and 11 in the 2.4GHz spectrum and three for channels 36, 40 and 48 in 
the 5 GHz spectrum. Since our particular focus lies on privacy violations arising 
from the information contained in the SSID field, we evaluate it with respect to 
the following: 
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— The amount of probe requests containing non-empty SSIDs. 

— The amount of SSIDs sent per burst. 

— Privacy implications of transmitted SSIDs: what potentially personal data 
can be gleaned from the data set? 

— The use of MAC address randomisation. 


We calculate an intersection between the data sets of two different days and 
remove all probe requests by devices that appear in both of them. That way, 
we strive to isolate the permanent devices in the vicinity of the measurement to 
have a clearer view on devices more likely to represent human passersby. 


4.1 Potential Ethical and Privacy Concerns 


A modern smartphone might use MAC address randomisation and refrain from 
transmitting SSIDs and thereby protect the identity of its user and render itself 
less trackable. Older devices are often less privacy sensitive, transmit their real 
MAC address and maybe even known SSIDs. This data can be considered per- 
sonal data and should therefore only be collected and stored with particular care 
for the device owner’s privacy. To ensure ethical data aggregation, we submitted 
our study for approval to the ethics committee of the Informatics department 
of the University of Hamburg under case number 002/2021. The steps taken 
to protect the peoples privacy as observed and in accordance with the ethics 
committee can be found in Appendix A.1. 


5 Data Analysis 


Our field data set contains 252 242 probe requests. We found that overall, 23.2% 
of the probe requests contained SSIDs. Prior measurements done by Dagelié [6] 
between 2014 (46.7%) and 2018 (12.9%), and also Vanhoef [32] in 2016 (29.9% 
to 36.4%), revealed higher numbers in 2014 and 2016, from which a decline is 
absolutely expected. At the same time, the records of 2018 and our measurements 
do not match up. One explanation might be, that while a measurement at a 
music festival might record the probe requests of younger people with more 
recent devices that already omit SSIDs in probe requests, our measurements 
were taken in the city centre of a touristic city around noon, where perhaps 
a larger percentage of people kept their (older) devices over a longer period 
of time. Our numbers do however correlate with the market share of Android 
devices [26]: 10.2% of Android devices use Android 8, and devices older than 
Android 8 amount up to 12%. In these devices, manually added networks are 
considered hidden networks [20] and they are therefore probed for with SSID. 
Seeing that Android devices make up approximately 70% of the market share, 
while iOS devices make up around 29% [28], the percentage of SSIDs in the data 
set is slightly higher than expected. 

During our measurement, 116961 probes (46.4%) were captured in the 
2.4 GHz spectrum, of which 28836 (24.7%) contained at least one SSID. In the 
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Table 2. Distribution of the number of SSIDs per cluster. 


# SSIDs 1 2 3 4 5 6 7 8 >8 
Share 67.8% 8.2% 4.0% 6.5% 2.7% 2.2% 0.9% 6.6% 1.1% 


5 GHz spectrum, we recorded 135 281 probes (53.6%), of which 29653 (21.9%) 
contained an SSID. 

To prepare the probe requests for analysis, we first grouped all requests that 
were sent from the same MAC address within a period of four seconds into 
bursts. We then grouped all bursts into clusters of bursts, likely belonging to a 
single device, if their PNL was equal. We explicitly did not group requests into 
the same cluster if their PNL matched only partly to avoid misclassification of 
distinct devices with partly overlapping PNLs. At the same time, a cluster with 
only one SSID might contain requests from distinct devices. As can be seen in 
Table 2, 67.8% of the bursts contain just one SSID, while the remaining 32.2% 
contain more than one SSID and are unique enough to track devices with it. 
This is considerably less than Vanhoef et al. recorded in 2016 [32]: In their data 
set, 53% to 64.8% of the bursts contained a unique PNL. 

Of the probe requests containing an SSID, we identified at least 362 devices 
sending requests from multiple randomised MAC addresses. 542 devices used 
only one MAC address, and did not employ MAC address randomisation. 

In an additional evaluation, we found that the average amount of probe 
requests sent per unique MAC address was 4.8. This is, again, a legitimate 
decline in comparison to the capture by Dageli¢ [6] et al. in 2014 (24.1), 2015 
(29.2) and 2017 (6.1) respectively, but is, again, higher than their 2018 count 
(2.6). For packages including SSIDs, the average amount of probe requests sent 
per MAC address was 11.2, which confirms that these are likely older devices of 
which less employ MAC address randomisation. 

In the following, we analyse the values contained in the SSID field of probe 
requests in order to estimate the privacy violations that occur in their commercial 
collection and analysis. 


5.1 SSID Contents 


As mentioned in Section 2.3, devices running Android 8 and lower treat manually 
added networks like hidden networks [20]. We conjecture that a lot of the SSIDs 
in our record originate from users trying to set up a network connection manually 
by entering both SSID and password through the advanced network settings, 
and, apparently mistakenly, enter the wrong strings as the SSIDs. The devices 
then retransmit the PNL with every probe burst. This results in significant 
additional information for fingerprinting devices compared to the empty wildcard 
SSID that would be transmitted otherwise. 

In the following, we elaborate on our findings of a manual, as well as auto- 
mated inspection of the encountered SSIDs. 
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Password Leaks in SSID Broadcasts. A small but significant amount of 
probe requests containing SSIDs potentially broadcast passwords in the SSID 
field: We identified that 11.8% of the transmitted probe requests contain numeric 
strings with 16 digits or more, which are likely the initial passwords of popular 
German home routers (e.g., FritzBox or Telekom home router). This hypothesis 
is supported by various cases, in which the numeric strings follow a PW:, WPA: or 
(WPA/WPA2:). Also, we repeatedly found both a 16- or 20-digit string and in the 
same burst additionally the same string, but separated by a space, a dot or a 
comma every four digits (e.g. 1234567812345678 and 1234 5678 1234 5678), 
which is a typical way of improving readability of the initial password on the 
router case. The multitude of similar spellings support our assumption that the 
users tried the same way of logging in multiple times with different spellings of 
the same credentials. 

Leaking passwords in SSIDs is especially critical if, along with the password, 
the device also broadcasts the true SSID either correctly or with a mistype 
that can be used to infer the true SSID. Only 2.8% of the transmitted SSIDs 
classified as probable passwords were the only entry in the corresponding PNL. 
All other probable passwords were transmitted in bursts with other SSIDs that 
might contain the actual SSID belonging to the password. The assumption that 
the sniffed passwords correspond to SSIDs that were also transmitted could 
additionally be verified by setting up fake access points on the fly using the 
potential credentials we observed. As that would constitute an active attack on 
the devices and since we are determined to improve user security, not undermine 
it, we decided against employing fake-AP attacks. 

Moreover, Wi-Fi locations can often be gathered using Wi-Fi mapping ser- 
vices like WiGLE, as we demonstrate in Section 5.2. Additionally and with 
enough criminal energy, an attacker could follow the owner of a talkative device 
to their home and try out the password in their home network. 


Broadcasts of SSID Mistypes. In a manual analysis of the data set, we found 
various devices that broadcast multiple different spellings of presumably the 
same SSID. We assume that users manually entered them into their devices while 
trying out for different spelling and capitalisation variations, e.g., my network, 
MY_NETWORK, MyNetwork. We quantify the amount of mistyped SSIDs by calcu- 
lating the normalised edit distance between all SSIDs in a burst stemming from 
a single device. The edit distance defines the minimum amount of operations 
(insertions, deletions or substitutions of characters) needed to transform one 
string into another. Since the edit distance can also be calculated over strings of 
different length, we normalise the result with respect to the longer string length, 
i.e., we divide the edit distance by the maximum length of the two SSIDs. Sim- 
ilar strings have an edit distance close to 0, while the edit distance is closer to 
1 the more strings differ. Before calculating the edit distance, all input strings 
are transformed to lowercase, as otherwise, the normalised edit distance between 
SSIDs like NETWORK and network would evaluate to 1. We set the threshold at 
which the strings are considered similar and thus treated as mistyped to 0.3. 
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This way, strings that differ in less than 30% of their letters are considered simi- 
lar. We decided on such a high threshold to accommodate short SSIDs as well as 
long ones. A manual inspection verified that the results fulfill the typo criteria 
and nevertheless do not contain distinct network names like “Fritz!Box 7490” 
and “Fritz!Box 7590”. 

We found that 19.9% of the transmitted SSIDs, stemming from 138 distinct 
bursts, are similar enough to another SSID in the same burst to be considered 
a typo. Such a set of constantly transmitted misspelled SSIDs increases the 
fingerprint of a device drastically and makes tracking it easy. 


Additional Findings. We found at least one string that corresponds to a 
store and the Wi-Fi password of the store’s internal Wi-Fi. We deem this highly 
likely as it began with the letters “PW:” and contained the name of the store 
in the password. We identified 106 distinct first and/or last names, which were 
propagated 3339 times over the course of the experiment. We found three e- 
mail addresses that were propagated 36 times. We identified 92 distinct holiday 
homes or accommodations whose SSIDs users had added to their list of known 
networks, which were propagated 1257 times. In addition, we found the name 
of a local hospital broadcast in two different spelling variations 15 times. It is 
particularly shocking to see such sensitive information like an e-mail address 
being transmitted openly, let alone the hospital name, from which a potential 
stay at the hospital can be inferred. At the same time, the name of a person 
or the hotels and holiday homes in which they have stayed can also be used to 
draw conclusions about the person. 


5.2 Geolocation Discoverability and Uniqueness 


To provide a better estimate of whether an SSID exists or not, we ran all observed 
SSIDs trough the geolocation lookup API of WiGLE. This way, we were able to 
find out whether the captured SSIDs correspond to actual APs catalogued by 
WiGLE. Of course, this approach has one limitation: Mobile devices should, in 
a perfect scenario, only transmit SSIDs of hidden networks. Those should not 
be included in the WiGLE map at all, as the service only maps the transmitted 
SSIDs. 

To evaluate the uniqueness of location of the SSIDs we found, we performed 
an analysis on the coordinates that WiGLE returned. To reduce the accuracy 
of the location estimation, we limited the amount of decimal places of the coor- 
dinates to 2, thereby providing an approximate 1-kilometre radius in which the 
actual network can be found. This also removed artifacts like multiple networks 
with the same SSID found within a radius of a few metres, that most likely 
belonged to the same network. Our input consisted of 1478 unique SSIDs. We 
had to limit our evaluation to a subset of 1440 SSIDs, as the remaining 38 con- 
tained special characters, which WiGLE can’t resolve. We were able to pinpoint 
334 SSIDs to one unique location and 377 SSIDs returned multiple locations. 
729 (50.6%) of the SSIDs could not be localised anywhere in the world. The 
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latter are either hidden networks that weren’t mapped by WiGLE due to being 
hidden or mistyped SSIDs. 


Password Evaluation. To provide an estimate of how many SSIDs contained 
passwords, we filtered the list for 


— strings that contain 16 or more numeric digits and 
— strings that contain “pass”, “pw”, “kennwort” (the German word for pass- 


word) or “wpa”. 


Our input consisted of 77 unique strings classified as passwords. The WiGLE 
evaluation resolved only a single one of the strings to a unique location. We 
infer that this is strong evidence that the identified strings are in fact actual 
passwords. 


Typo Evaluation. We performed the same evaluation on the potential typos 
we identified. In this analysis, we inserted all spelling variations we found of an 
SSID into our evaluation, which amounted to 296 unique strings. The hypothesis 
in this case was that at least half of the SSIDs we identified were in fact typos 
and would not resolve to an existing SSID. We assumed that it would be more 
than 50% since quite a few of the potential typos had more than one spelling 
variation. We assumed that the remaining SSIDs would contain the correct and 
actual spelling that could resolve to an access point. 

Our analysis showed that of the 296 strings classified as potential typos, 
we were able to resolve approximately 41.9% of the SSIDs and identified 47 
unique locations and 66 cases of multiple locations. These results support our 
hypothesis. 


Limitations. The evaluation of networks contained in WiGLE is severely lim- 
ited: Only networks that are not hidden appear in it, while probe requests of 
current devices target only hidden networks. Nevertheless, we found a large per- 
centage of the networks do in fact exist. This can have several explanations: (1) 
multiple networks with the same name exist, and the one in question is in fact 
a hidden one, (2) the network has been set to hidden recently, and the map still 
contains the result of a scan from before, (3) the network was manually added 
in WiGLE. 


6 Miitigations for Increased Privacy 


Our experiments detailed in the previous section suggest that users, presumably 
mostly by accident or unwillingly, add items to their list of preferred networks, 
including credentials and sensitive information. Together with legitimately added 
hidden networks, those threaten user privacy by being sent in plain, making this 
information observable and the users traceable. To mitigate the issue, we first 
present a proposal to avoid plain text transmission and then show approaches 
to limit and control traceable SSIDs through the user interface. 
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6.1 Hashing SSIDs in Probe Requests 


Recall that the introduction of the wildcard SSID in probe requests makes active 
scanning with specific SSIDs only necessary for hidden networks. While some 
publications consider hidden networks obsolete and no longer recommended [15], 
a recent study [25] revealed that in some areas, up to 44% of the detected 
networks were hidden. While the WPA3 standard contains improvements to the 
confidentiality management frames (802.11w) [9], this standard only applies to 
frames transmitted after a 4-way handshake, and not to frames transmitted 
without handshakes. Consequently, probe requests are not protected. To rectify 
this, we propose the following mitigation. 


SSID Hashing. To circumvent the need to send cleartext SSIDs, we propose to 
send them in a hashed and salted manner instead. The device emitting the probe 
request would first salt the SSID using its randomised sender MAC address and 
the sequence number of the packet and then hash it. It would then send the 
hash, but omit both salt components as they are included in the frame anyway 
like so: 

send(hash(M AC||SN||SSID)) 


Access points of hidden networks would then, upon receiving the probe 
request, prepend the MAC address and sequence number as salt to their own 
SSID, hash it, and compare the result with the received hash. If they match, 
the client was probing for their hidden network. As the MAC address should 
be chosen randomly and the sequence number changes with every packet, they 
introduce sufficient entropy and variability in combination to be suitable as salt 
to make sure the sent information can not be used to track and identify devices 
through a constant hash value. This mitigation could be employed regardless of 
whether or not a connection to the network has ever been established before, as 
it does not require the previous exchange of secrets. Another advantage of this 
mitigation is that potentially sensitive SSIDs (e.g., containing names or pass- 
words) can only be distinguished from other SSIDs (e.g., generic home router 
names) with significant effort of brute-forcing the cleartext, thereby improving 
the privacy of clients and AP operators. 


Attack Model. We consider an attacker that can monitor all probe requests 
and has bounded computational power, which makes it impractical for her to 
find preimages (SSIDs) of the hashes she observes. Introducing the combined 
MAC address and sequence number salt makes pre-computing hashes imprac- 
tical, provided that MAC addresses are randomised and sequence numbers are 
also randomly chosen within their full 12 bit value range. If the randomised MAC 
address contributes 24 bit of entropy, leaving out the EUI/CID parts as a lower 
estimate, this totals to 36 bit of salt entropy. Attackers with a priori knowledge 
of used SSIDs can however, reduce costs significantly compared to true brute- 
forcing, depending on the number of likely used SSIDs. Consequently, tracking 
a device whose SSID set is known is therefore feasible. 
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To determine the practical feasibility of a hash-based mitigation, we first 
evaluate the computational cost and then calculate the additional bandwidth 
requirement. 


Computational Overhead. Using the SHA-256 hash from Python’s hashlib 
library, we prototyped the hashing and comparison that an AP would have 
to perform: first hashing a received SSID and then comparing it to a known 
hash (the AP’s SSID). For baseline reference, we implemented the current string 
comparison of SSIDs. We performed one million computations in three runs on 
a Raspberry Pi Model 3, which is likely a good lower estimate of computational 
power of most routers. A single computation with hashing required on average 
11.7 microseconds pure CPU time, while the baseline required an average of 
4.6 ms. Using hashing therefore increases the time by 153.7%. Considering that 
the Raspberry Pi does not have cryptographic hardware acceleration, which 
professional Wi-Fi routers might have, we additionally ran the experiment on a 
laptop with an Intel i5, where hashing only added an average overhead of 53%. 

In our experiment in a busy pedestrian zone, we captured around 23 probe 
requests per second, of which only 23.2% contained an SSID. Therefore, follow- 
ing these figures, an AP would have to hash approximately 5.3 probe requests 
per second. As a Raspberry Pi can perform around 85 200 hashing operations 
per second, we deduce that hashing and comparing should be well within the 
available resources for similarly equipped APs even in much more frequented 
deployment locations. 


Bandwidth Overhead. Our proposal may also introduce a bandwidth over- 
head by always occupying the full 32 bytes available for SSIDs in a probe request 
frame [18, Sect. 9.4.2.2], whose length would otherwise vary with the actual SSID 
length. The average length of all packets in our city centre capture is 133.3 bytes, 
while the average length of packets containing SSIDs is 147.0 bytes. In our cap- 
ture, the average length of SSIDs was 11.4 bytes. If all SSIDs were transmitted 
as hashes, it would increase the size by 20.6 bytes, leading to an average size of 
packets with SSID of 167.6 bytes, which is an increase of 14.01%. Considering 
that probe requests make up a tiny fraction of the actual transmitted traffic, we 
consider this an acceptable trade-off for more privacy and less fingerprintability. 


6.2 Mitigations Through User Interface Design 


Current iOS and Android version already employ mechanisms to prevent users 
from accidentally adding items to their PNL, to maintain that list and to change 
the connection behaviour for individual networks on this list. In the following, 
we briefly summarise the status quo and suggest further improvements for more 
accessible and effective controls. 
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Network name 


hidden_network ir 
af =F = 
Enter network information Enter network information Security 

Cancel Other Network Join Cancel Other Network Join None 7 
Hidden network 

Name non_existent_network Name non_existent_network Yes = 
If your router is not broadcasting a network ID but you 

Security WPA2/WPA3 Securitv WPA?IWPA3 would like to connect to it in the future, you can set the 
network as hidden 

p dä Could not find the network 

asswori Pass «non_existent_network” This may create a security risk because your phone 


will regularly broadcast its signal to find the network 


OK Setting the network as hidden will not change your 
router settings 


(a) Attempting a connection to a hidden network on iOS (b) Warning message when 
15. Contrary to Android devices, iPhones only allow to adding a hidden network on 
manually add networks to which a connection can be Android 9 or newer. 
established. 


Fig. 2. User dialogues mitigating unwanted SSID entry. 


SSID Entry Safeguards. Both iOS and Android have safeguards against acci- 
dentally adding hidden networks: iOS will only add networks to which a connec- 
tion can be established at the time of entry (see Fig. 2a). In Android, a manually 
added network is no longer automatically considered a hidden network. Instead, 
to enter a hidden network, users have to explicitly select it and then receive 
a warning about the privacy risks (see Fig. 2b). We suggest to combine both 
measures for manually adding SSIDs. 


Known SSID Removal. Being able to remove entries from the list of preferred 
networks should be possible to reduce traceability and susceptibility for fake AP 
attacks [32]. However, as mentioned in Section 2.3, removing a known network 
from proximity is not straightforward in iOS. On Android, in contrast, the list 
of known networks can be modified directly and at all times. 


PNL Entry Expiry. In order to maintain its usability, user-facing lists of 
preferred networks should implement measures against cluttering, which result 
from steadily adding new SSIDs over time. To avoid cluttering with no longer 
needed SSIDs, e.g., added during temporary stays, we propose an expiration date 
for SSID entries: Upon adding a new SSID, the user is prompted to choose when 
the SSID should be forgotten again. The default expiry date would be never, 
but for SSIDs that are knowingly only in use for a limited period of a few days 
or weeks, users can choose accordingly. By limiting the life span of a network 
entry, its negative effects on traceability and the chance to exploit them using 
fake AP attacks are at least temporally limited. To the best of our knowledge, 
there is no such mechanism in any OS, but to implement it could significantly 
reduce the amount of SSIDs accumulated over time. 
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Adjustable Auto-Joining. In addition to networks of limited temporal rele- 
vance that benefit from expiry, there are also networks users use only occasionally 
but regularly, e.g., once every few months or years. While it is convenient to keep 
them in the PNL, this again increases the risk of fake AP attacks. Instead of 
removing such networks, preventing the automatic connection to them already 
effectively reduces the fake AP attack risk. Both iOS and Android offer ways to 
disable auto-joining on a per network basis. To make this option more visible and 
broadly realise privacy gains, we suggest to again prompt users when initially 
joining a network, whether they want to automatically or manually connect in 
the future. 


Silencing Probe Requests. For particularly high privacy demands, disabling 
probe requests altogether might be an acceptable trade-off. We therefore suggest 
an advanced network setting, where users are able to choose that their devices do 
not send active probe requests at all, knowing that (a) connection establishment 
relies only on passive AP announcements and might be slower, (b) the battery 
usage might be higher, and (c) the connection to hidden networks would be 
impossible. Such behaviour could also be part of a reduced visibility mode that 
user, e.g., activate in the control center, when they pass through an untrusted 
area like a shopping centre known for extensive visitor analytics, similar to a 
do-not-disturb switch. 


7 Discussion 


Legal Consideration of SSIDs. As mentioned in the introduction, Wi-Fi 
tracking is also used to measure pedestrian flow and count in cities. In two 
German cities, such measurements have been conducted until the responsible 
authority started investigations [31]. The authority’s reasoning was that MAC 
addresses are personal data and it is therefore not legal to record them without 
legal basis. In both cases, the measurements were ceased. Especially as the con- 
tinued roll-out of MAC address randomisation might give renewed support to 
the legal positions that (randomised) MAC addresses should no longer be con- 
sidered identifying information, we argue that this assessment can and should 
not be limited to the sender address of probe requests. Considering the wealth 
of personal and sensitive information we observed in SSID fields, they can con- 
stitute identifying information as well and thus require due consideration. For 
instance, for 334 of the SSIDs we measured, we were able to identify a unique 
location the SSID originated from. We therefore argue that at least for as long as 
there are still devices broadcasting SSIDs, probe requests should be considered 
personal data and not be used for monitoring without legal basis. 


Intentional Password Broadcast. Some APs might intentionally broadcast 
their password as SSID to allow visitors an unrestricted yet encrypted Wi-Fi 
connection, as an alternative to unprotected Wi-Fi. It is a limitation of our 
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password leakage evaluation that we cannot distinguish those intentional from 
unintentional cases. However, the fact that we could only resolve one of these 
SSIDs in our geo-lookup suggest that the ratio of SSIDs to actual APs is rather 
low. Additionally, the notion of intentionally storing passwords in the SSID field 
should have been superseded by the introduction of OWE (Opportunistic Wire- 
less Encryption): OWE describes the unauthenticated but encrypted connection 
between two devices, and is contained in the Wi-Fi specification under the name 
Wi-Fi CERTIFIED Enhanced Open™™ [7]. 


Deployment of Hashed SSID Scheme. Our proposed hash-based scheme 
requires modifications to the Wi-Fi implementations: The mobile device has 
to apply the hashing algorithm to its temporary MAC address, sequence num- 
ber, and the SSID of the sought-after network before transmission. The AP, 
upon receiving a probe request, has to apply hashing to the MAC address and 
sequence number as salt to its own SSID. On one hand, these changes are easy 
to implement, on the other hand they require a widespread deployment to be 
effective. While the privacy gain would be worth the deployment effort, it is 
unfortunately likely that only newer devices would profit from the scheme, while 
older devices would remain unpatched. 


Limiting Bandwidth Overhead. Our proposed method of salting and hash- 
ing the SSIDs of hidden networks to improve user privacy introduces a bandwidth 
overhead, (cf. Section 6.1): The average length of a packet containing an SSID 
would be increased by 14%. This could be addressed by truncating the hash 
to, e.g., 16 bytes before inserting it into the SSID field. That way, the average 
packet length would be reduced to 151.6 bytes, which results in an overhead of 
only 3.2%. At the same time, this reduces the security of the system, as hash 
collisions become more likely. 


Impact of OS Support Lifespans. A contributor of non-wildcard probe 
requests in the wild, besides hidden networks, are legacy devices. While devices 
running Android 10 or newer use MAC address randomisation and omit SSIDs, 
older devices do not receive updates long enough to benefit from this improve- 
ment. This especially disadvantages people with budgetary constraints or sus- 
tainability in mind, who keep their devices longer. This can only be rectified by 
longer support lifespans that should be legally mandated if manufacturers do 
not move voluntarily. Such changes with significance to user privacy should be 
considered like critical security patches and be back-ported to older versions. 


8 Conclusion 


Probe requests are plainly observable to everyone around a sending device. Since 
they can contain sensitive data, they should be sent more carefully and with pri- 
vacy in mind. We have collected and analyzed data in a pedestrian zone to gather 
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insight into the status quo of probe requests in the wild. We identified a wealth of 
personal information in transmitted SSIDs such as potential passwords, holiday 
homes and e-mail addresses, which should be considered personal and sensi- 
tive data. For 334 SSIDs, we were even able to resolve their unique geographic 
location. 

In the area of anonymisation of probe requests, some progress has been made 
in the last few years, with the latest mobile operating system updates. Never- 
theless, we are still facing problems in terms of user privacy. To minimise the 
amount of personal data that can be sent accidentally, we propose mitigations 
for both the network layer and the user interface. The first consists of hash-based 
concealment of SSIDs using a salt constructed from the MAC address and the 
sequence number of the request. To demonstrate its feasibility we provide esti- 
mates of the computational overhead and additional bandwidth requirements. 
For the latter, we propose to remodel the Wi-Fi handling of mobile devices to 
add only existing networks in range and to empower their users to take more 
control over their PNL and minimise the amount of potentially exploitable data. 


Acknowledgements. We would like to thank our reviewers for their valuable and 
constructive feedback. 


A Appendix 


A.1 Ethical collection of probe requests 


Following approval and in coordination with the ethics committee of the infor- 
matics faculty of the University of Hamburg, we conformed to the following 
measures to observe and protect users’ privacy rights: 


— During the time of the experiment, we set up a well visible sign declaring 
the undergoing probe request monitoring, including information on how to 
contact the person in charge. 

— We informed and obtained consent from building management to conduct the 
experiment. 

— We provided an option to remove recorded probe requests should participants 
state their non-consent. 

— We used off-the-shelf wireless USB antennae with a limited range to narrow 
the radius of our measurement 

— In case the data set contains personal information, we either anonymise it 
before storing, or delete it directly after analysing it. 

— Any personal data is stored securely, both technically as well as organisation- 
ally, to prevent misuse. 

— To preserve location privacy, we limit the amount of decimal places of the 
coordinates returned by WiGLE to 2, thereby providing an approximate 1- 
kilometre radius in which the actual network can be found. 
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Abstract. Deep attestation is a particular case of remote attestation, 
i.e., verifying the integrity of a platform with a remote verification server. 
We focus on the remote attestation of hypervisors and their hosted vir- 
tual machines (VM), for which two solutions are currently supported by 
ETSI. The first is single-channel attestation, requiring for each VM an 
attestation of that VM and the underlying hypervisor through the phys- 
ical TPM. The second, multi-channel attestation, allows to attest VMs 
via virtual TPMs and separately from the hypervisor — this is faster 
and requires less overall attestations, but the server cannot verify the 
link between VM and hypervisor attestations, which comes for free for 
single-channel attestation. 

We design a new approach to provide linked remote attestation which 
achieves the best of both worlds: we benefit from the efficiency of multi- 
channel attestation while simultaneously allowing attestations to be 
linked. Moreover, we formalize a security model for deep attestation 
and prove the security of our approach. Our contribution is agnostic 
of the precise underlying secure component (which could be instanti- 
ated as a TPM or something equivalent) and can be of independent 
interest. Finally, we implement our proposal using TPM 2.0 and vTPM 
(KVM/QEMU), and show that it is practical and efficient. 
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1 Introduction 


Network Function Virtualization (NFV) is a technology that promises to provide 
better versatility and efficiency in large-scale networks. The core idea is to move 
from architectures in which physical machines are set up to perform various 
roles in a network, to a design in virtual configuration. As such, a machine could 
be configured and re-configured at distance, and, by judicious use of virtual 
machines, it could perform a variety of roles within the network infrastructure. 

Virtualized platforms are set up in layers, including the following basic com- 
ponents: physical resources, the virtualization layer and infrastructures, virtu- 
alized network functions (VNFs), and the NFV management and orchestration 
module. At the bottom of the infrastructure are real, physical components, meant 
for computations, storage, and physical network functions. The virtualization 
layer (also called hypervisor) manages the mapping between those physical com- 
ponents and virtual equivalents. As such, the NF Vs — hosted by virtual machines 
running inside the NFV infrastructure- never have direct access to the physical 
resources. Instead, the VNFs access the virtual resources. The NFV manage- 
ment and orchestration module runs the combined infrastructure, including: the 
lifecycle of the instantiated VNFs, resource allocation for VNFs, or overall man- 
agement in view of particular, given network services. 


Deep Attestation (DA). Virtualization enables efficient, versatile remote net- 
work configuration and administration; however, the fact that multiple virtual 
processes share resources can introduce hazards to security. One way to ensure 
that a component runs correctly is by using attestation. Attestation is a pro- 
cess complementary to authentication: whereas the latter allows a platform to 
prove that it is the entity it claims to be, the former ensures that the platform 
runs a trustworthy code, i.e., it has not been breached. As described in [13], 
“Attestation is the process through which a remote challenger can retrieve veri- 
fiable information regarding a platform’s integrity state.” A property can be for 
instance software integrity, geolocalisation, access control, etc. 

Attestation relies on a root of trust (RoT), usually instantiated through a 
trusted platform module (TPM) — or an equivalent mechanism. The root of trust 
is responsible, amongst other things, for protecting sensitive cryptographic mate- 
rials (such as private keys) and for running cryptographic operations in an iso- 
lated way. The virtualization layer (hypervisor) has direct access to the RoT, 
but the virtual machines it manages do not; instead they will have access to 
the RoT by means of virtual Roots of Trust (vRoTs). Virtual Roots of Trust 
are a combination of resources, some provided by the physical RoT, and other 
managed by the hypervisor, which directs and mediates access to the RoT. 

In a nutshell, attestation is a process which allows an independent, remote 
verifier to check that a target platform still behaves in the desired way. This is 
done by first authenticating the RoT, then by comparing a measurement of the 
current state of the component to a presumably-correct state, as indicated in a 
Root of Trust for Storage (RTS). In addition, a guarantee must be given of the 
correctness of the RTS, which is done by means of a Root of Trust for Reporting 
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(RTR). Functionalities of RTS and RTR can be provided by a TPM. A TPM is an 
example of implementation that could provide RTR and RTS by leveraging the 
specific tampering detection properties of its Platform Configuration Registers 
(PCR) and issuing signed reports, or quotes, of their content. 

We consider the attestation of two types of components: virtual machines 
(VMs), such as VNFs, and the hypervisor managing them, whose underlying 
physical component includes a RoT providing an RTR and an RTS. This archi- 
tecture is depicted in Fig. 1. 

To verify that the VMs and the hypervisor 


are running correctly, both these types of com- 
ponents must undergo remote attestation. First, 


each component must attest in isolation; then 
we must attest the layer-binding between VMs 


Hypervisor 


running on the same hypervisor. This is known 
as deep attestation (DA). There are two typical 
ways of achieving deep attestation (as described 


by ETSI standardization documents [13]): single- 
and multi-channel VM-Based Deep Attestation. Fig. 1. The setup for DA. 


Single/Multi-channel Deep Attestation. In single-channel deep attestation 
the attestation is run only between the remote verifier and the virtual machines. 
At each attestation, the VM (by querying its associated virtual TPM, or vTPM) 
provides not only an attestation for itself, but also the hypervisor it runs on. 
Specifically the response forwarded 
by the VM to the remote verifier 
includes the (independent) attesta- 
tion of the hypervisor, and the layer- 
Hypergene binding attestation between the VM 
ca and its hypervisor. This is depicted 
in Fig. 2, on the left-hand-side. Note 
that the quotes in this case are both 
obtained from the (slow) physical 
Fig. 2. Single vs multi-channel DA TPM. From the point of view of secu- 
rity, this solution is optimal; however, 
it scales poorly. Given as few as 1000 VMs running on top of the hypervisor, we 
would require that the hypervisor be attested 1000 times, once for each VM. 
By contrast, in multi-channel deep attestation, the VMs are attested sepa- 
rately and independently from the hypervisor. In this scenario, the VMs attest 
to the remote verifier, thus proving they were not tampered with. Separately, 
the hypervisor also attests to the remote verifier. This can be seen on the right 
hand side of Fig. 2. In this case, the efficiency is optimal: for 1000 VMs, we have 
1000 VM-attestations and 1 hypervisor attestation. However, there is virtually 
no layer-binding between the VMs and their hypervisor: there is no guarantee 
that the VMs are really managed by the hypervisor. An attacker could therefore 
“convince” a party (such as the owner of the infrastructure) that a VM still 
exists on a given physical machine when it has, in fact, been removed. 
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Our Solution. We take the middle path between single- and multi-channel deep 
attestation to obtain layer-binding between VMs and hypervisors with reason- 
able efficiency. Our solution is simple, yet elegant, using standard cryptography 
to ensure that a hypervisor’s single attestation is linkable to any number of 
attestations of VMs managed by it. We give three contributions: 


A Cryptographic Scheme. Our scheme ensures secure and efficient linked 
DA. The hypervisor and VMs each attest only once. However, we also embed a 
list of public keys (associated with the VMs managed by the hypervisor) within 
the hypervisor attestation, which is established by the root of trust. In order 
to authenticate the list of forwarded keys, we embed them into the attestation 
nonce, forwarded by the attestation server. If the hypervisor’s attestation veri- 
fies, then the attestation server can link that hypervisor with the (subsequently 
attesting VMs) which use keys in the forwarded list. If the hypervisor’s attesta- 
tion fails, then the public keys cannot be trusted. 


Provably Secure Authorized Linked Attestation. An important advan- 
tage of our approach is that we have a fully-formalized provable-security guar- 
antee. We use a composition-based approach, constructing primitives that are 
increasingly stronger out of weaker ones. Our goal is to ultimately obtain autho- 
rized linked attestation (ALA): a primitive which allows components to individ- 
ually attest (to an authorized entity), and to have their attestations linked. This 
primitive solves the problem outlined in the introduction, since VMs sharing the 
same hypervisor will attest in isolation and together with their hypervisor. 

ALA schemes will have three properties: authorization (only an authorized 
server can query an attestation quote); indistinguishability (no Person-in-the- 
Middle adversary can know even a bit of a quote exchanged during a legitimate 
protocol with probability significantly better than $); and linkability (an attes- 
tation server can detect if two components are not linked) 

We choose to formalize AKA security as the last of a sequence of primitives, 
each potentially of independent interest and providing gradually stronger prop- 
erties. This approach has two virtues: first, we are able to use weaker primitives 
as black-box components in stronger primitives; and second, the individual proof 
steps are shorter and smoother. 

At the basis of our construction is a yea-or-nay basic attestation scheme, 
which is “secure” by assumption. Its functionality is simple: the basic attesta- 
tion scheme outputs a faulty attestation whenever a component is compromised, 
and a correct one for honest components. In other words, this basic attestation 
scheme is a compromise-oracle: when queried it (indirectly) produces a proof of 
whether a component has been tampered with or not. 

Based on this assumption, we build a sequence of cryptographic mechanisms 
that add security against stronger adversaries. A first step is to build authen- 
ticated attestation: a scheme which allows us to authenticate the component 
that provides the attestation, and additionally ensures that this component’s 
attestations always verifies prior to corruption, but fails to verify as soon as 
a compromise occurs. We can think of authorized attestation as the minimum 
provided (and required) by multi-channel attestation. Then, we consider linked 
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attestation: a scheme that introduces the hypervisor-VM relationship described 
above, and permits not only the verification of individual attestations, but also 
(publicly) linking attestations. 


Implementation. We used a regular laptop equipped with TPM 2.0 (as a root 
of trust). We set up an architecture with one hypervisor and multiple VMs. 
The VMs used full virtual TPM as a virtual root of trust. We made over 100 
experiments. This showed that our solution is more efficient that single chan- 
nel approach and adds only insignificant charge (a hash function computation) 
compared to traditional multi-channel DA. 

Our work is, to our best knowledge, the first that attempts to provide a sound 
cryptographic treatment of deep attestation. In many ways, this is much harder 
than designing the scheme that we present, because attestation is a generic term 
comprising an entire class of algorithms that have different goals. As such, we 
are only scratching the surface here, and believe that —aside from the real, and 
practical advantages of our presented construction- our cryptographic treatment, 
primitives, and proofs, may be of independent interest to this line of research. 


Limitations. A first fundamental limitation is the fact that we assume, in our 
constructions, the existence of a basic attestation primitive that works infallibly 
like an oracle, telling us if a component is compromised or not. In reality, this 
primitive is based on the Platform Configuration Registers (PCRs) of a TPM. 
A PCR can store hash digests into a register of the length of the hash function 
output. Typically a TPM will have multiple banks corresponding to various 
hash functions (e.g., a shal bank and a sha256 bank) with 24 registers for 
each bank. PCR are reset at each boot and are only updateable through an 
extension operation PCR, — H(PCR, | H(measurement)). We assume the 
attacker has no physical access to the component and thus cannot tamper with 
TPM measurements by using hardware attacks. In practice, this is somewhat 
limiting since we do not account for runtime corruption; thus, the primitive is 
vulnerable to Time of Check Time of Use (TOCTOU) attacks. Several proposed 
mechanisms were introduced to monitor runtime integrity, e.g., LKIM [19] or 
DynIMA [12]; moreover, in recent years several advancements were made towards 
verifying runtime integrity for IoT devices [15,17]. Yet, these solutions are not 
as widely spread at the present day as TPM-based attestation at startup. 

We treat the existence of basic attestation as an assumption because we do 
not see a way of constructing it with cryptographic tools. The cryptography 
we put on top adds a lot of new properties: authenticity, confidentiality, autho- 
rization, linkability, but not the simple fact of distinguishing a compromised 
component from an honest one. Our result should therefore be interpreted as a 
need for such a scheme to exist, as in fact required by ETSI [13]. 

Another limitation of our scheme lies in our model of linked-attestation com- 
ponent. We consider classes of components which can be linked. At registration 
of each piece of hardware, a number of subcomponents of each type is indicated — 
and (unique) keys are given to those components. As a result, we cannot account 
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for having two hypervisors that manage the same VM on a given infrastructure. 
A future work could be to consider multi-hypervisor VM as introduced in [14]. 


Related Work. Many attacks have been recently reported on remote attesta- 
tion mechanism [10] or 5G standards [16]. Many tools such as formal methods 
or cryptography can be used to model and prove the security of such standards. 
However, this lack of formalization must be now addressed otherwise we will 
have more and more attacks. Provable cryptography is a nice solution to solve 
this problem since it allows to better understand the security model, what is the 
adversary goal and its means, which oracle can he query. Some cryptographic 
primitives have already be nicely formalized such as Direct Anonymous Attesta- 
tion (DAA) which enables remote authentication of a trusted computer (TPM 
for instance) while preserving the privacy of the platform’s user in [9] by Brickell 
et al. It is a group signature without the feature that a signature can be opened, 
i.e., the anonymity is not revocable. Such primitive are well described using 
cryptography as a variant of signature scheme. However, provable cryptography 
has also been used successfully to formalize security protocols as authenticated 
key exchange [7,11]. This is precisely our goal to model the different security 
components independently and to compose them to prove the security of a new 
security mechanism. Indeed, the attestation server must authenticate the whole 
platform, i.e., the hypervisor and the NFV running on top. This problem has 
been addressed by others in the context of secure boot or for instance in [6], 
where the authors propose an attestation mechanism for swarms of device soft- 
wares in IoT and embedded environment. Software attestation is different from 
remote attestation, as said in [5] since it cannot rely on cryptographic secrets to 
authenticate the prover device. The first to have taken into account deep attes- 
tation are Lauer and Kuntze in [18] but their solution misses a security proof 
and a rigorous analysis. 


2 Towards Authorized Linked Attestation 


Our core contribution provides layer-binding in deep attestation. Cryptographi- 
cally, we view this as a new primitive, which we call authorized linked attestation, 
built in steps from increasingly-stronger primitives. Each of these intermediate 
steps plays a double role: on the one hand, it formalizes security guarantees that 
are of independent interest for attestation (if, for instance, layer-linking is not 
required); on the other hand, it provides an intuition of the guarantees which 
specific cryptographic primitives can help achieve. 

The first, and basic-most step in our architecture is basic attestation. This 
primitive is an abstraction of the algorithm by which a single party (like a 
component of a virtualized platform) generates an attestation of its state, given 
a fresh, honestly-generated nonce. Importantly, basic attestation does not employ 
cryptography to achieve this feature, but rather, the attestation of registers at 
startup, using a RoT.? 


1 To ease notation, we assume that all the registers are attested, and that the property 
we are attesting is that the entire component has not been compromised. 


A Cryptographic View of Deep-Attestation 405 


Authenticated attestation builds on basic attestation by associating parties 
with identities. The attestation must now no longer indicate whether the party is 
compromised: it must also authenticate the component. Here, thus, we enhanced 
basic attestation with a cryptographic component, which is in fact sufficient 
to guarantee the basic functionality required by multi-channel attestation. One 
step further, the linked-attestation primitive built from authenticated attestation 
will allow two different components to (a) attest their own states; (b) provide 
auxiliary material that will make two separate attestations linkable. While this 
primitive has no immediate parallel in real-world attestation, we use it as a 
handy way of dividing the security proof of our ultimate result into two: linked- 
attestation will focus on proving the fact that two attestations can be securely 
linked; whereas authorized linked attestation models attestation as a protocol, 
using fresh randomness and a secure channel using an honest attestation server. 

We also add a new party into the system: the attestation server that serves as 
a verifier. We then compose the linked-attestation primitive with a unilaterally- 
authenticated authenticated key-exchange protocol, which will authenticate the 
attestation server and permit the attestation itself to remain confidential with 
respect to a Person-in-the-Middle (PitM) adversary. 


2.1 Basic Attestation 


During basic attestation a single honest party is generated. This party can be 
later compromised. A quote-generation algorithm will output a quote if the party 
is still honest at that time, or a special symbol if it is not. Finally a (public) 
verification algorithm will yield 1 (the component is honest) or 0 (otherwise). 

Note that a party such as the one we describe could correspond in practice to 
a combination of two parties: a virtual entity (like a VM or the hypervisor) and 
an underlying, uncorruptible, secure part (the TPM), which actually generates 
the quote. At this stage, we importantly do not associate these entities with keys 
as authentication will only appear in our next step (Sect. 2.2). 

What we want to capture, formalized by the security of basic attestation, 
is the minimal assumption that a compromised component will always yield 
an attestation that will fail the verification. This is why, when basic attesta- 
tion is run for a compromised component, it will yield the special symbol N. 
We also demand correctness: when a non-N quote is generated, the latter will 
automatically verify. Our basic attestation component thus becomes the mini- 
mal non-cryptographic assumption that we need to make to prove our scheme 
secure. 


Formalization. We consider an environment parametrized by a security param- 
eter A, in which we have a single party P. This party keeps track of a single 
attribute, namely a compromise bit y originally set to 0. Once this bit is flipped 
to 1, it can never go back to 0. We define a primitive BasicAtt as a tuple of 
algorithms: (aBSetup, aBAttest, aBVerif): 


— aBSetup(1*) — ppar: on input the security 1’ (in unary), this algorithm 
outputs some public parameters ppar. 
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— aBAttest(ppar) — quote: on input the public parameters ppar, if P.y = 0, 
then this algorithm outputs an attestation quote quote Æ N for P, and if 
P.y = 1, then it outputs N. 

— aBVerif(ppar, (quote U N)) — 0 U 1: on input public parameters ppar and a 
value that is either a quote denoted quote or a special symbol N, this algorithm 
outputs a bit. By convention, an output of 0 means the attestation fails, while 
if the output is 1, the attestation succeeds. We require by construction that 
for all ppar: aBVerif(-,-,X&) = 0. 


This primitive is also depicted in Fig.3. We assume that if P.y = 0 and 
quote — aBAttest(ppar), then aBVerif(ppar, quote) = 1. 


Security. The only security we demand from this primitive is that, if a party is 
compromised, then its attestation will always fail. This will happen by construc- 
tion (since this is an assumed primitive) and is embedded in the security model. 
The adversary A will play a game against a challenger G. Initially, the challenger 
sets the system up by running aBSetup to output ppar which is given to A. The 
unique party is generated, such that its corrupt bit is set to 1 (P.y = 0). 

Since A now has ppar, it can now run the aBAttest and aBVerif algorithms. 
In addition, it has access to the OBAttest oracle: OBAttest() — (quoteU N). This 
oracle calls the aBAttest() algorithm for the (corrupted) party P and returns 
the output to the adversary A. The challenger stores the result in a database 
DB. The adversary wins if, and only if, there exists a quote in DB (possibly with 
quote = N) such that aBVerif(ppar, quote) = 1. Note that by construction our 
basic attestation primitive is secure, since once the compromise bit is set, the 
output is N, which always yields aBVerif(ppar,&) = 0. 


Basic Attestation in Reality. One may wonder at this point what our purpose 
might be in constructing a security model for a primitive that is by definition 
correct and secure. We need that security model in our reductions: we will use 
the attestation primitive to build stronger, linked attestation, and then we will 
want to make the argument that if an attacker can break the larger primitive, 
it will also break the smaller primitive. As the smaller primitive is secure by 
design, this is not possible, and hence, the larger primitive is also secure. 


2.2 Authenticated Attestation 


Basic attestation acts as a foolproof way of telling whether a device is compro- 
mised or not. However, the security it provides is very weak. For one thing, it has 


Target T Appraiser 


Setup phase: aBSetup(1*) — ppar 


quote 


aBAttest(ppar) + quote +> aBVerif(ppar, quote) — 0 if T compromised (T.y = 1) 
— 1 if T uncompromised (T.y = 0) 


Fig. 3. Basic attestation description with an honestly-generated target. Notice that 
there is no authentication involved. 
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no authentication guarantees, so potentially one could use a quote that was hon- 
estly generated for an honest component to attest a compromised one. Another 
problem that is more subtle concerns the way components are compromised. 
Because the basic quotes described in the previous section have no timestamp, 
nor specific freshness, we cannot take into account adaptive tampering. In the 
security notion, the party generating the quote is either honest or compromised 
from the beginning. Yet, ideally we would like a primitive that ensures that a 
party can start out as honest (and all the quotes generated at that time verify 
as correct), and later be compromised (and all the quotes generated after that 
moment will fail). We can do this by deploying cryptographic solutions. 

A relevant question is why we did not include these security aspects in the 
basic attestation primitive considered above. To answer this, recall that we have 
constructed the basic attestation tool to be secure by design. As such, it is an 
assumption, rather than a solution. If we also assume authentication, it would 
go against the principle of using minimal assumptions. 


Correctness. The correctness of our construction depends on the detection of 
a compromised component. There are three cases to consider. Assume first that 
the component is compromised. In that case, the output attestation is N. The 
component can try to authenticate this quote, but the verification will fail. In the 
second case, the component (VM or hypervisor) is not compromised, and so will 
receive a valid attestation quote, authenticated by the TPM. This authenticated 
quote will verify. Finally, in the third case, the component is not compromised, 
and receives a valid authentication quote. At this point, the adversary might try 
to forward the authenticated quote and pass it off as someone else’s attestation, 
but this will fail as long as the authentication primitive is EUF-CMA secure. 


Formalization. The precise formalization of this primitive is in the full version 
of our paper [4]. We consider an environment containing up to N parties. The 
parties keep track of the compromise bit y used also for basic attestation, and a 
pair of public and secret keys denoted, for each party P, P.pk (the public key) and 
P.sk (the private key). Intuitively, the security we require for this primitive will 
be that a valid authenticated quote for a party P and fresh auxiliary information 
(used as nonce) is hard to forge by an adversary which knows all the the public 
information, can register and compromise users, and query an attestation oracle 
that returns a valid quote or N. In particular, in a secure scheme,verification 
should fail if either the authentication or the attestation fails. 


Construction. We construct an authenticated attestation scheme out of basic 
authentication, a large set of nonces M := {0,1}* (with £ chosen as a function 
of the security parameter A), and an EUF-CMA-secure signature scheme Sig = 
(aSigkGen, aSigSign, aSigVerif). We thus instantiate AUX := M, and our 
AuthAtt scheme is as follows: 


— aAduthSetup(1*) — ppar: this algorithm runs aBSetup(1*) a number N of 
times, outputting ppar,, ppar.,..., ppar,. Each time ppar, is created, a party 
handle P; is also created (it will be the party associated with the instance of 
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Target T Appraiser 


Setup phase: aAuthSetup(1*) — ppar 


aAuthKGen —> (T.pk, T.sk) 


aAuthAttest(ppar, T.sk, aux) —> (quote, o) segues aussi aAuthVerif(ppar, T.pk, aux, (quote, o )) 
— 0 if T compromised (authQuote = X or ø invalid) 


— 1 if T uncompromised (authQuote Æ X and ø valid) 


Fig. 4. Authenticated attestation built upon basic attestation (Fig. 3). 


BasicAtt run for those parameters). It sets ppar := (ppar,, ppars,.--, ppary, 
N), and outputs this value (Fig. 4). 

— aAuthKGen(P;) — (P;.pk, P;.sk): it keeps a counter (starting from 0), which 
indicates how many times this algorithm has been run. If at the time this algo- 
rithm is queried counter < N, then aAuthKGen runs aSigkGen as a black box 
and outputs the resulting (pk, sk) (public and private) keys. It sets P;.pk := pk 
and P;.sk := sk. Party P; is then initialized with these keys. 

— aAuthAttest(ppar, P.sk, R) — authQuoteUN: on input the public parameters 
ppar, a private key P.sk of a party P (which has already been registered), 


and a value R È N, this algorithm first runs quote + aBAttest(ppar), 
then the algorithm signs ø — aSigSign(P.sk, (quote, R)), that is, it signs 
a concatenation of the nonce and the obtained quote. The output of this 
algorithm is authQuote := (quote, c). If the required party or key does not 
exist, the value N is output by default. If quote = N, then we instantiate 
authQuote = ÑN. 

— aAuthVerif(ppar, P.pk, R, (authQuote U N)) > 0 U 1: on input public param- 
eters ppar, a public key P.pk of a party P, an auxiliary value R € N, 
this algorithm first checks if the last input is N; if so, the algorithm out- 
puts 0 by default. Else, the algorithm parses authQuote = (quote, o) (with 
quote Æ N by construction), then runs b — aSigVerif(P.pk, quote,o) and 
d — aBVerif(ppar, quote). The algorithm outputs b A d. Notably, 1 is output 
if, and only if, signature and basic attestation verify concomitantly. 


Theorem 21 (Secure Authenticated Attestation). The AuthAtt scheme 
is secure assuming that (1) BasicAtt scheme is secure (2) the size of N is large 
and (3) the Sig signature scheme is EUF-CMA secure. 


The proof is given in the full version of our paper [4] 


2.3 Linked Attestation 


Authenticated attestation allows the attestation of one (out of many) compo- 
nents, based on that component’s unique secret key. If we define now parties as 
being either VMs or hypervisors, the notion of authenticated attestation suffices 
to capture the basic guarantees of multi-channel deep-attestation. However, in 
this paper our goal is to allow parties to link their attestations (a hypervisor’s 
attestation should, e.g., , be linkable to various VMs hosted on that platform). 
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In this section we describe our next primitive: linked attestation. The latter 
takes place in an environment where several parties are registered in a linked way 
— this corresponds to a single platform. A first step is platform registration, by 
which several parties are linked on the same underlying hardware. Each entity 
later generates a linkable attestation — verifiable on its own, and linkable with 
other linkable attestations. 

Although our application scenario is that of linking VM and hypervisor attes- 
tations, we make our framework more generic than that. Instead of just two types 
of components, we consider linkable sets S1, S2,..., Sz, which resemble equiv- 
alence classes. These sets are defined such that any party in one set (say Ps, ) 
can produce an attestation that is linked to attestations produced by parties in 
sets S2,..., Sz. We write Po Q to say that two parties are linked. The relation 
is reflexive (PoP), symmetric (if Po Q, then Qo P), and transitive (if PoQ and 
QoR, then PoR). 

We formalize a linked-attestation scheme LinkedAtt as a tuple of algorithms 
LinkedAtt = (aLSetup, aLReg, aLAttest, aLVerif,aLLink), defined for some 
auxiliary set AUX. The detailed formalization is given in the full version [4]. 

The setup algorithm outputs public parameters ppar, including the maxi- 
mal number L of sets considered for linking. One can register platforms includ- 
ing subsets of components of each type: this algorithm generates keys for each 
party. A linked attestation algorithm produces a linked quote linkedQuote and 
an auxiliary linking value Ikaux. Finally, the verification algorithm checks the 
attestation in each individual linkedQuote and the linking algorithm outputs 1 if 
several linked attestations seem to belong to the same registered platform, and 
0 otherwise. This syntax is also depicted in Fig. 5. 


Platform 1 Platform 2 
ae ] Bee ee ee ee Ts ne ee 
5. Sa |! 
par a ‘| 
Pia ; i|| Poa Poo Q2, i 
: Qin} | Qi2 A EE EE 1 
1 1 
if] Pia} | Pas i 
1S Saar AE ee AATE ERE 1 
linkedQuote linkedQuote linkedQuote 
4 3 
Verif: (T, Pi,2) Verif: (T, Q1,1) "o Verif: (T, P2,2) 
Link : T Link : L 


Fig. 5. Linked attestation primitive. Dashed lines indicate platform under the same 
registration. Here, both platforms are composed of two subsets (S1, S2). There are a 
total of three quote verifications (P1,2, Q1,1, P2,2). The link verification outputs true for 
devices registered on the same platform and false otherwise. 
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The security of linked attestation informally states that an adversary, which 
has Person-in-the-Middle capabilities and can compromise devices at will, cannot 
make it appear that two devices are linked when they are not, in fact, so. 

A significant limitation on the adversary’s capabilities is that compromising 
a device will not leak its private keys (which are assumed to be held by a TPM). 
However, the adversary will gain a limited oracle access to those keys upon 
compromising the device. The limitations to those queries follow rules of access 
to an actual TPM. 

More formally, we define the security of linked attestation as a game 
LinkSec) - parametrized by a security parameter A and a set of functions F, 
which we call the permitted key-access functions. The adversary wins if it is able 
to make attestations stored in La for parties registered on different platforms 
(P and Q) link. However, at this point the adversary is constrained to a change- 
one-change-all kind of game: it cannot, for instance, append an Ikaux component 
of its choice to an honestly-generated linkedQuote, nor vice-versa. 

In the security game, the adversary registers platforms and can compromise 
some of their components. When a component is compromised, the adversary 
gets oracle access to a set of permitted functions of the component’s private key. 
As a result, the strength of the security proof depends on the function space 
F. The more functions the adversary is able to query once it compromises a 
component, the more security our primitive is able to provide. However, note 
that we cannot give the adversary access to some functions, such as the identity 
function on the component’s private key. 


Construction. We provide a construction for platforms that have two types 
of components: virtual machines (VMs) and their managing hypervisor. Thus, 
in our instantiation, L = 2. We use an authenticated attestation scheme 
(aAuthSetup, aAuthKGen, aAuthAttest, aAuthVerif) as a black box. The basic 
construction is depicted in Fig.6. During setup, our linked-attestation scheme 
first runs aAuthSetup and outputs ppar and L = 2. Note that by construction 
aAuthSetup must output a number N, denoting the maximal number of parties 
that can be set up. This counter will represent a global maximum to parties of all 
types that will exist in our ecosystem. Following setup, one can register a subset 
of VMs together with a hypervisor. The algorithm runs the key-generation algo- 
rithm aAuthKGen of the underlying authenticated attestation scheme for each 
party, independently (note that this also ensures that the total number of par- 
ties remains at most N). Finally, keys are grouped by types of parties: keys of 
VMs are output in a set of public keys PK, and the key of the hypervisor is 
output as PKo. 

The VMs and hypervisor generate linked attestations differently. The hyper- 
visor first fetches the public keys of all the components registered with it on the 
same platform. It computes a new nonce as the hash of two concatenated values: 
the original auxiliary value aux and the list of the public keys. The component 
then runs aAuthAttest on the public parameters, this new nonce, and its pri- 
vate key, outputting the authenticated quote. By contrast, when a VM attests, 


aLSetup( 1): 
ppar’ — aAuthSetup(1*) 
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Attest hypervisor P on platform (s,82) with nonce aux 
aLAttest(ppar, PK, P.sk, (s1, S2), aux): 
Parse PK as PK[1], PK[2] // PKI] is the set of all VM pks 


Return ppar — ppar’ 


Registers a platform with set sı of VMs and the hypervisor in s 
aLReg(si,S2): For each i € {1,2}: 
For each j € s;: 
(Pj.pk, Pj.sk) — aAuthKGen(P;) 
Group all Pj.pk into PK; and all Pj.sk into SK; 
Return{(PK, SK1), (PK2, SK2)} 


‘ Attesting VM P on platform (s,,82) for nonce aux 
aLAttest(ppar, PK, P.sk, (s1, S2), aux): 
Get P.pk matching P.sk from PK 


Parse PK[1] as PK}, PR. PKISI spi contains the keys of dl Vide on platform 4 
Set Ikaux — PK" with k the index of sı in S; 
Ikaux is now the list of all VM keys on that platform 
aux* — H (aux||lkaux) 
Embed Ikaux into a new attestation nonce 
authQuote — aAuthAttest(ppar’, P.sk, aux”) 
linkedQuote +— authQuote 
Return (linkedQuote, Ikaux) 


Link VM quotes from Uh, and the hypervisor quote from Us 
aLLink(ppar, PK, H, H2): 
Initialize AUXym — 0 


The linking information is P's public key 
aux* — H(aux||Ikaux) 
authQuote — aAuthAttest(ppar’, P.sk, aux*) 
linkedQuote — authQuote 

Return (linkedQuote, Ikaux) 


Ikaux — P.pk For each (Pj.pk, aux, linkedQuote, Ikaux) € I): 

Return 0 if aLVerif(ppar’, P).pk, linkedQuote, aux, Ikaux) returns 0 

Return 0 if Ikaux # Pj.pk 

Linking fails if quotes fail to verify or authenticate each VM 

Add Ikaux to AUX\m 
Parse H as (P}.pk, aux, linkedQuote, Ikaux) 
If aLVerif(ppar’, P;.pk, linkedQuote, Ikaux) returns 0 
AUXnhym < Ikaux // This Ikaux is a list of VM public keys. 
Return 0 if AUXym is not a subset of AUXhym 

Linking fails if the hypervisor’s list of PKs does not include all VM keys. 

Return 1 


Embed \kaux into attestation nonce 


Each \kaux here is a VM public key. 


aLVerif(ppar, P.pk, linkedQuote, aux, Ikaux): 


/ Verify attestation quote of party P 
aux” — H(aux||Ikaux); 
authQuote +— linkedQuote 
Return aAuthVerif (ppar’, P.pk, authQuote, aux") 


Fig. 6. Our linked attestation scheme for platforms with 2 types of components: VMs 
(stored in S1) and hypervisors (stored in S2). Each type of component attests via a dif- 
ferent aLAttest algorithm, the main difference between them being that the hypervisor 
embeds a list of public keys in its nonce. 


it computes a new nonce from the original auxiliary value aux and (only) its own 
public key. The authenticated quote is provided as the VM’s linked quote. 

A VM (or a set of VMs) are considered to be linked to a hypervisor if, 
and only if, the following conditions hold simultaneously: (1) the attestations of 
all the purportedly-linked parties verify individually (if we run aAuthVerif it 
returns 1 for each individual attestation); (2) the public key that was successfully 
used to verify each of the VMs’ attestation is part of the auxiliary value lkaux 
forwarded by the hypervisor. 


Correctness. The LinkedAtt scheme is built upon the AuthAtt scheme. There 
are two types of component to consider, VM and hypervisor. When a component 
is registered on a platform, its public key is appended in a list (PK, for VMs, 
and PK for the hypervisor). The public key of a VM is appended to the quote 
in aLAttest and can be retrieved by the hypervisor. The latter can link the 
attestation to a public key via algorithm aLLink. We consider two cases to verify 
the correctness (1) a VM (not compromised) is not registered on the platform, 
and (2) a component (VM or hypervisor) is compromised. For (1) the attestation 
will be correct since the component is not compromised, but the linking process 
will abort since the public key does not belong to PKy. For (2) if a VM (or the 
hypervisor) is compromised then the attestation will fail since the authenticated 
attestation is supposed to be correct (the aAuthAttest algorithm is executed to 
generate the quote). 
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Security. We prove (see full version of our paper [4]) the security of our scheme 
with respect to a single permitted function, Fsign that takes in input a message 
M from a message space M and outputs, when queried for a compromised party 
P, asignature on the message M with the private key P.sk. We demand that the 
message space M be disjoint from the range of any basic attestation scheme. 


Theorem 22 (Secure Linked Attestation). The LinkedAtt scheme is 
secure assuming AuthAtt scheme is secure and hash function H is collision 
resistant. 


2.4 Authorized Linked Attestation 


So far, attestation has been viewed as a primitive, run by a single party (which 
can be of various types) and outputting an attestation. However, one of the most 
important requirements of attestation is that the actual quote only be given to 
authorized parties — which we call attestation servers [18]. 

We will define an authorized linked attestation protocol, which allows an 
attestation server to act as a verification party in the attestation procedures. 
The same server will also be the one to generate the auxiliary values required 
for the attestation (this provides freshness to the protocol). The server will also 
be responsible for linking multiple attestations. 


Intuition. We provide a full formalization of authorized linked attestation 
below. However, we also believe it is useful to first give an intuitive understanding 
of what this primitive is and the security it wants to achieve. 

In authorized linked attestation we consider a (single) attestation server S and 
platforms consisting of several types of components (as shown for linked attes- 
tation). The server will keep track of an evolving state, which is initially empty. 
However, as the server starts to attest various components, at every execution 
of the authorized attestation protocol, the server will output a verdict (indicat- 
ing whether the component’s individual attestation has failed or succeeded) and 
may — or may not — update its internal state. Intuitively, the state is meant to 
contain the linking information provided by each of the attesting components. 
After a number of attestations, the server might have enough information in its 
state to decide if some of the components are linked or not. 

The security notion we require for authorized linked attestation is threefold: 
(1) we require that parties only provide attestation guarantees to the actual 
attestation server; (2) we require that the contents of the attestation be actually 
indistinguishable from random for all unauthorized parties; (3) we require a 
similar kind of linking security as demanded in linked attestation see Sect. 2.3. 
However, as opposed to linked attestation, the adversary in this case can also 
play a Person-in-the-Middle role between honest components and the honest 
server, or it may attempt to replay messages or impersonate one or both parties. 
Finally, the adversary will be able to have oracle access which returns the secret 
key of any compromised component (this oracle access is parametrized in terms 
of a function space F of allowed functions). 
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Formalization. The complete formalization of authorized linked attestation 
is given in the full version [4]. Components on platforms are either VMs or 
hypervisor. In addition, we consider a S, which stores a tuple consisting of a 
public and a private key S.pk = pk and S.sk = sk respectively, and a state S.st. 
Parties interact with each other in sessions, which are run by an instance of the 
server and an instance of a given component. Instances of each party use that 
party’s long-term public and private keys, as well as potential local randomness, 
such as instance-specific nonces. An instance of a component and an instance 
of the server are partnered if they essentially run the same session (formally, if 
they share a session identifier, which consists of the concatenation of a number 
of session-specific values). 

Authorized linking attestation is defined as the tuple ALA = (ASetup, AReg, 
AAttest, aALink). The first, second, and last of these are algorithms, while 
AAttest is a protocol. The setup algorithm generates parameters (keys and pub- 
lic system values) for all the involved parties. The registration algorithm allows 
the VMs and hypervisor on a single platform (defined as sets sı for the VMs 
on the platform and s2 for the hypervisor) to be associated with each other. 
For administration purposes, the public keys of all VMs on a platform (i.e., , 
all VMs in some s4) and respectively the public key of the platform’s hypervi- 
sor (the hypervisor in the corresponding s2) are stored respectively in subsets 
PK, PK (i = 1 for VMs and i = 2 for the hypervisor). Together all the subsets 
PK; for all the components form a set PK[i] (for i = 1,2). 

The authorized attestation protocol is run by an instance of a component 
and an instance of the server, yielding, for the component, an acceptance bit 
(corresponding to the authentication of its partner as the authorized server) and 
for the server, a tuple verdict, S.st: the verdict verdict is 1 or 0 depending on 
whether the component attested successfully or not, and the state is an update 
of the server’s current internal state. Finally, the server state can be used on a 
subset of components in the aALink algorithm, yielding either 1 (the components 
are linked) or 0 otherwise. 


Construction. Our construction of the ALA primitive can be seen in the Fig. 7. 
We consider the existence on an underlying LinkedAtt scheme that we use 
for the aLSetup, aLReg and aLLink in a straightforward manner. However, 
the aLAttest algorithm is no longer a primitive, but a protocol between two 


ASetup(1*) 


ppar < aLSetup(1>) AReg(si, s2): aALink(ppar, PK, S.st, $1, $2): 
Create S {(PKi, SK:), (PK2, SK2)} ¢ abReg(si,s2) || Parse S.st as H1, Hz 
(S.pk, S.sk) + aSigkGen(1*) || Return {(PK1, SK:), (PK2, SK2) } Return aLLink(ppar, PK, I, H2) 


Return (ppar, S.pk, S.sk, S) 


AAttest(ppar, PK, Tp, 7): 

Component Server 
Establish TLS channel 

(linkedQuote, Ikaux) + aLAttest(ppar, PK, P.sk, (si, $2), aux) = aux È AUX 


AttestationRequest 
(linkedQuote, Ikaux) 


verdict + aLVerif(ppar, P.pk,, linkedQuote, aux, Ikaux) 
AttestationResponse 
Add (Pjpk, aux, linkedQuote, Ikaux) to II; in S.st 


Fig. 7. Our authorized linked attestation scheme for 2 types of components. 
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instances of two parties, P and Q. For simplicity of exposition, we assume that 
the instance of Q is the server attesting the component identified by P. 

The protocol proceeds as follows. First, P and Q execute the TLS protocol, 
with P playing the role of the client and Q playing the role of the server. The 
role of the TLS protocol is two-fold: first, P authenticates the server, so that 
they can determine whether this party is allowed to obtain attestation data. 
Second, it leads to the establishment of a secure channel, such that the following 
messages can be passed on in a secure manner. Once the traffic key(s) estab- 
lished, the protocol continues as follows. First, the server uniformly randomly 
samples a nonce aux, which is embedded in the first message of the protocol, 
AttestationRequest. In response, the party P executes the aLAttest algorithm 
and the output, consisting of a linkedQuote and the linkage information Ikaux, is 
then sent to the server. The server will subsequently update his state. 

In order for two components to be linked by the server successfully, the fol- 
lowing conditions have to be met. First, the two components’ attestation must 
be valid (their associated verdicts equals 1). Second, the two Ikaux must be sub- 
sets of each other; essentially, the key that the VM used as part of its attestation 
must be found in the lkaux provided by the hypervisor. 

We note that if the server has at some point accepted the attestation of a 
component (thus updating its state to add the linking information), and if later a 
failed attestation occurs with respect to that component, the server updates state 
as follows: it ignores the linking information provided in the second attestation; 
and it removes prior linking information provided by that component. 


Security. There are three fundamental properties we want ALA schemes to have: 
an authenticity guarantee for the attestation server (authorization); a confiden- 
tiality guarantee for the contents of the attestation (indistinguishability); and a 
linkability guarantee for honestly-behaving components (linking-security). The 
first notion, authorization, captures the fact that before reaching an accepting 
state, a (non-server) party must be sure that it is speaking to the legitimate 
server (game AuthSec) fF). The second notion, indistinguishability, essentially 
covers Person-in-the-Middle confidentiality for the attestation protocol (game 
AuthInd) F,,,). The last property, linking-security, refers to the fact that no 
PitM adversary with the ability to compromise components can convince an 
attestation server that a component is linked to another if that is not the case 
in reality (game AuthLink) -). Although this last property might seem similar 
to the security notion for our linked attestation primitive, there is one impor- 
tant difference between the two: in linked attestation the adversary has access 
to essentially two ways to generate an attestation (depending on whether the 
component is honest or compromised), whereas in authorized linked attestation 
the adversary will have more leeway in combining attestation material across 
sessions. The stronger adversary in this section will thus make for a stronger 
primitive in the end. The three security games are defined in the full version [4]. 
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Theorem 23. Our construction is AuthSec),F,,, secure if the TLS protocol pro- 
vides server authentication: Pr[A wins AuthSec) F] < ETLS-auth- 


Theorem 24. Our construction is AuthLink\ Fsg, secure if the underlying prim- 
itive LinkedAtt is LinkSec) Fs and TLS is at least (s)ACCE secure. 


Theorem 25. Our construction is AuthInd),-,,, secure if the TLS channel pro- 
vides (minimally) (s)ACCE security. Let dsessions be the number of sessions. 


; 1 
Pr[A wins AuthInd) F,,,.] < ——€L§-sACCE: 


sessions 


3 Implementation 


We provide a proof of concept implementation of our authorized linked attesta- 
tion scheme. The implementation consists of three parts, a client for the hyper- 
visor, a client for the Virtual Machines, and an attestation server written in 
Python 3. We do not consider the underlying NFV or cloud infrastructure, since 
our scheme abstracts those environments and can be used in any kind deep- 
attestation scenario. Therefore, any computer equipped with a TPM 2.0 (which 
can also be emulated) and which has virtualization capacities suffices for the pur- 
poses of our implementation. We provide our code as well as a detailed tutorial 
on how to install and configure both the infrastructure [3]. 


The Infrastructure. We summarize our 
testing architecture in Fig.8 (note that some memm | 


of our tests use more than 2 VMs — up to 55). Em =e 

Our hypervisor is a laptop running Ubuntu | Sa Sa he | 
20.04.3 (kernel version 5.11.0-40) with an Intel 
i7-10875H CPU, 32 GB RAM and a STMicro- | SZ — | 


electronics TPM. We used KVM to turn this 

laptop into a hypervisor. For high attestation Fig. 8. Architecture for tests. 
performance, we used full virtual TPM imple- 

mentation, using QEMU [1] with libtpms 0.7 [8] and swtpm 0.5 [20]. 

All virtual machines are QEMU virtual machines (version 4.2.1) with 1 core 
and 512 RAM running Fedora 35 Cloud. The VM as well as the virtual TPM 
instances are managed using Vagrant and Vagrant-Libvirt plugin. 

The hypervisor, server, and VMs communicate through a private network 
created with Vagrant. Thus, connection time is not considered in our tests. 

To communicate with the TPM we used tpm2-tss, tpm2-abrmd and tpm2- 
tools from the tpm2-software [2]. Note that the tpm2-tss project implements the 
TPM software stack (TSS), which is an API specified by the Trusted Computing 
Group to interact with a TPM. The tpm2-abrmd implements the access broker 
and resources to manage concurrent access to the TPM and manage memory 
of the TPM by swapping in and out of the memory as needed (hardware TPM 
have limited memory). 
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Table 1. Minimum, median, mean and maximum time in second for attestation of a 
hypervisor and a virtual machine for 100 trials. 


min | median | mean | max 
Hypervisor | 3.801 | 11.940 | 13.896 | 40.762 
VM 0.337 0.416 | 0.422 | 0.572 


The attestation server is also a virtual machine, with the same characteristics 
as those above. This allows us to test our implementation on a single machine. 
We establish a secure connection between the client and the server by using 
Python’s SSL library. 


Tests. We perform three types of experiments. The first is a comparison of 
hypervisor attestation time and VM attestation time. Although both those pro- 
cesses have some (very small) amount of noise, our values faithfully show the 
difference between attesting a component through the physical TPM — hypervi- 
sor attestation — and attesting it by using a virtual TPM — VM attestation. 

We ran 100 attestations for the hypervisor and 100 attestations for a virtual 
machine. The results have high variance so Table1 presents the minimum, the 
maximum, mean, and median value of those 100 trials. As expected, time for an 
attestation using a hardware TPM is much higher than using a vTPM. 

As our second and third experiments redi ansiar as ton-100 ta 
we wanted to see how the overall run- i i 
time of our scheme evolves with the num- i 
ber of virtual machines that need to be 
attested, when the attestation is sequen- 
tial or parallelized for the VM attesta- 
tions. In both cases, each experiment first 
runs the attestation of the hypervisor, and 
then (sequentially or in parallel) the attes- 
tation of a varying number of VM (up to a 


10 20 30 40 50 


maximum of 55). The results are plotted Number of VA 
in Fig. 9. We note that the runtime is not 
entirely linear. This is because in experi- Fig. 9. Attestation time. 


ments 2 and 3 the initial attestation of the 
hypervisor (which only occurs once) takes larger time than the subsequent VM 
run-times. 


Comparison to Single-Channel Attestation. We did not implement single- 
channel attestation. However, since we have implemented hypervisor and VM 
attestations, we can theoretically estimate the run-time of single-channel attes- 
tation for a varying number of VMs — which we plot in Fig. 9. Indeed, a single- 
channel attestation process for a single VM includes a VM attestation and a 
hypervisor attestation. If we want to run it for 2 VMs, then we need to perform 
2 hypervisor attestations and 2 VM attestations. This cannot be easily paral- 
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lelized either, because the same TPM has to run the attestations. This yields a 
much higher run-time, as depicted in Fig. 9. 


Comparison to Multi-channel Attestation. Although our method follows 
basic multi-channel attestation approaches, we do add an extra computation (a 
hash function computation) compared to traditional multi-channel attestation. 
In addition, we require a little extra memory overhead for both the attestation 
server and for each platform, so that the additional attestation keys are stored 
for each VM. There is also a slight transmission overhead, since those keys are 
also sent upon attestation. However, the transmission overhead is negligible since 
it only appears for the hypervisor attestation (which occurs only once). 


4 Conclusions and Future Work 


We proposed a layer-binding in deep-attestation without running into the com- 
plexity of single-channel attestation. Our construction achieves the best of both 
worlds, with a complexity similar to that of multi-channel attestation, but with 
the strong linkage properties provided in single-channel attestation. 

We accompany our construction by a proof-of-concept implementation that 
clearly shows the viability and scalability of our solution, especially if VM attes- 
tations are run in parallel. 

In addition, we are the first to present a full, formal treatment of our new 
protocol, which we call authorized linked attestation. Our construction of autho- 
rized linked attestation is modular, built on primitives which have increasingly 
stronger properties. Our underlying assumption is a primitive called basic attes- 
tation. We show that in order to be able to prove security, we need that attes- 
tations be able to reflect compromise of the component. In addition, we rely on 
a collision-resistant hash function, an EUF-CMA-secure signature scheme, and 
the sACCE security of a TLS protocol (having AKE properties would be better). 

However, our model (and scheme) does not immediately account for other 
features of virtual infrastructures, such as privacy CAs, migrating VMs, multiple 
hypervisors managing the same VM, or even replacing TPMs. These aspects are 
left as future work. 
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Abstract. In related-key attacks (RKA), an attacker modifies a secret 
key stored in a device by tampering or fault injection and observes the 
evaluation output of the cryptographic algorithm based on this related 
key. In this work, we show that the dual system encryption methodology 
of Waters (Crypto 2009) fits well with RKA security. We apply sim- 
ple modifications to a regularly-secure identity-based encryption (IBE) 
scheme (TCC 2010) constructed through dual system to achieve RKA 
security for rational functions, which is beyond the polynomial barrier of 
Bellare et al.’s framework (Asiacrypt 2012). We achieve security by push- 
ing the complexity of RKA directly down to the underlying intractabil- 
ity assumption. We also discuss how to extend it to a hierarchical IBE 
scheme that remains secure against RKA over identity-based secret keys 
beyond the master secret, albeit under some structural constraints. 


Keywords: Identity-based encryption - Related-key attacks - Dual 
system encryption 


1 Introduction 


Related-key attacks (RKA) are useful in attacking cryptosystems. One of the 
earliest targets is blockciphers [4,5]. Under RKA, an adversary can obtain input- 
output samples under not only the target key but also some related keys. Security 
against RKA is a popular design goal since a key stored in memory could be 
modified by tampering and fault injection. RKA against public-key primitives 
could cause broader damage to many users, e.g., RKA could be mounted on 
a signing key of a certificate authority or SSL server or a master key for an 
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identity-based encryption (IBE) system. Similarly, the identity-based secret keys 
of hierarchical IBE (HIBE) will affect the security of all other descendent users. 

Bellare et al. [2] showed how to construct ®-RKA secure signatures, public- 
key encryption, and IBE from -RKA secure blockcipher or pseudorandom func- 
tions (PRF). The parameter @ is a function family that the adversary can choose 
a related-key derivation function from, which the function will be applied to the 
secret key, and its output is used to execute the keyed function of the crypto- 
graphic primitive. The RKA resilience of the resulting IBE is thus subjected 
to that of the underlying blockcipher/PRF. Many existing (H)IBE schemes are 
not linear-RKA-secure [3]. Bellare et al. [3] proposed a framework to convert 
regularly-secure IBE schemes into -RKA secure ones. Their framework requires 
expanding the identity space since the master public key mpk is appended to 
the user’s identity. So, it is only applicable to schemes that treat the size of the 
identity space as a tunable system parameter. The same applies to the recent 
result of Fujisaki and Xagawa [18], which achieves RKA security for efficiently 
invertible related-key derivation functions!. Their best instantiation is from an 
extension of Waters IBE [29] for ® being the class of polynomials with degree d, 
which increases the number of multiplications during encryption and key gener- 
ation by log(|G])/2 times on average since mpk is a group G element, and the 
public parameter is of size O(d) for the modularity of the framework [3]. 

This work provides affirmative answers to a few unsolved problems. 


— Is it possible to establish the -RKA security of an IBE scheme for different & 
under a family of assumptions instead of following prior frameworks [3, 18]? 

— Is there any practical IBE scheme in the standard model that “natively” 
attains ®-RKA security, for being more general than the class of degree-d 
polynomials, with a short public keys and short ciphertexts? 

— Can such an RKA-secure IBE scheme be extended to RKA-secure HIBE? 

— Can we achieve RKA security when the adversary can modify not only the 
master secret key but also the identity-based secret keys? 


Technical Overview. Our core technique is dual system encryption (DSE) pro- 
posed by Waters [30], a paradigm for constructing adaptively secure IBE in the 
standard model with short public parameters. To our knowledge, its potential 
in establishing RKA security is yet to be explored. We propose a -RK A-secure 
IBE scheme in the standard model, where @ is a general class of rational func- 
tions. It is almost as efficient as the underlying standard (H)IBE schemes [24] 
in the famous commutative-blinding framework (featuring many extensions [9]). 
They are practically efficient (purely pairing-based without heavyweight tools or 
any bit-level cryptographic processing) and feature a short public key and short 
ciphertexts (independent of the length of an identity string). 

RKA in IBE allows the adversary to adaptively issue polynomially many 
queries to a tampered key extraction oracle, namely, Extract(¢(msk), ID) for iden- 
tity ID and function ¢ € P, where msk is the master secret. When ¢ is an identity 


1 Fujisaki and Xagawa [19] provided a counterexample refuting the security proof of 
Qin et al. [26] for their RK A-secure IBE scheme in the split-state model (7.e., tamper- 
ing can only be applied to the two parts of the encoded secret state independently). 
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map and ID is the challenge identity, no scheme can remain secure. We observe 
that enforcing this natural restriction is all we need, without further restrictions 
on ®, to transit from normal to semi-functional (SF) output, a core ingredient 
for a typical security proof using DSE. In this sense, DSE fits nicely with RKA 
security (beyond its applications in a related field of leakage resilience [23,31]). 


Implication. Our observation allows us to achieve RKA security beyond state- 
of-the-art polynomial RKA, particularly rational functions. We directly relate 
the class of RKA functions @ to an intractability assumption related to &. We 
can then formulate the -assumption to accept functions beyond polynomials, 
e.g., any functions based on group operations. It is easier to analyze whether a 
specific function is allowed by checking the validity of the related $-assumption. 
For the existing -RKA secure IBE framework [3], there seems no easy way to 
check if it remains secure for ¢’ outside of P. We also hope that our approach can 
inspire generalizations, e.g., investigating more exotic classes of RKA functions”. 

Another notable benefit of our direct approach is that the “artifacts” required 
for key malleability just stay in the security proof but do not manifest in the 
actual construction. This helps the extensibility of our scheme, especially when 
IBE is shown to be versatile [9]. As a showcase, we extend our scheme to HIBE. 


HIBE. HIBE features a Delegate algorithm taking in a parent secret key sk, to 
generate a child secret key with id; appended at the end of the parent identity. 
There are two classes of RKA functions, Pe, and Pq, one for Extract and one 
for Delegate. Non-trivial RKAs can be launched over the identity-based secret 
keys beyond the master secret, i.e., giving the value Delegate(¢(sk,), idj) to the 
adversary. Our result for HIBE does not reduce ®4 to the underlying intractabil- 
ity assumption; however, it comes with some structural restrictions®. Removing 
them may require techniques beyond DSE, which we leave as an open problem. 


2 Our -Oracle Bilinear Diffie-Hellman Assumption 


2.1 Composite Order Bilinear Groups and Existing Assumptions 


Let G be a bilinear group generator, which takes a parameter 1 as input where 
à € N, outputs a description of bilinear group context (N = pıp2p3, G, Gr, ê), 
where pj, p2, p3 are distinct A-bit primes, G and Gr are cyclic groups of order N, 
and ê: G x G — Gr is a bilinear map such that Vg,h € G and a,b € Zy, 
é(g%, h?) = é(g,h); and é(g,g) generates Gr if g is a generator of G. 

We use Gp, to denote the subgroup of order p; in G (i = 1,2,3). Let g; be a 
generator of the subgroup Gp,. Note that for all h; € Gp, and hj € Gp,, if i Æ j, 


? In the most optimistic case, our construction might turn out to remain secure against 
an even broader class of RKA attacks, but just no one has explicitly analyzed the 
hardness of the corresponding version of the assumption so far. 

3 Special restrictions may also apply to existing schemes. For example, Goyal et al. [20] 
proposed selectively secure ®-RKA secure signatures, where ® are the set of poly- 
nomials which are distinct even “ignoring the constant term” (i.e., the difference 
between any two polynomials should not just be in the constant term). 
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é(hi, hj) = 1. We use Gp,», to denote the subgroup of order pipz in G. For all 
T € Gp,p,, T can be written uniquely as the product of an element of Gp, and 
an element of G,,. We refer to these elements as the “Gp, part of T” and the 
“Gp, part of T” respectively. We also define Gp,p, and G = Gy, papa similarly. 
Below are two complexity assumptions to be used in our security proofs. 
Assumption 1 [24]. For the bilinear group generated by G(1*), given g = Gp, , 
X3 2 Gp, and T, any probabilistic polynomial-time (PPT) algorithm A can 
only distinguish whether T € Gpipa or T € Gp, with negligible probability in A. 


Assumption 2 [24]. For the bilinear group generated by G(1), given g È Gp, 


X3 pe Gp,, X1X2 ES Gpip., Y2¥3 pa Gp,p,, and T, any PPT algorithm A can 
only distinguish whether T € Gpp or T € G with negligible probability in A. 


2.2 @-Oracle Decisional Bilinear Diffie-Hellman Assumption 


We introduce a non-static* assumption based on the decisional bilinear Diffie- 

Hellman (DBDH) problem. It features an oracle taking a function f that outputs 

values embedding f(a), where a is a secret exponent in the problem instance. 
-Oracle DBDH Assumption. Given a group generator G, we define 


Experiment 6-ODBDHg,4,4(1*) 
a, $ $ $ 
(N = pipops, G, Gr, ê) = G(1), a,s — Zn, g, v — Gp; 
$ $ : ai $ 
X2, Y2, Z2 — Gpo, X3 = Gp, To= ê(lg, g), Ti = Gr. 
Return 6’ — ACO) (N, G, Gr, ê, g, g% X2, X3, g°Y2, Z2, 0, 0%, v, Ta), 


where on input f € ®, the oracle O outputs (g/() Wo, vf Va) for W2, V2 € Gp, 
freshly chosen uniformly at random. We define the advantage of an algorithm A 
in breaking the -oracle DBDH assumption to be 


Advg,a(A) := |Pr[®-ODBDHg_4.1(1*) = 1] — Pr[®-ODBDHg_40(1*) = 1]|- 


We say that the bilinear group context from G satisfies the -oracle DBDH 
assumption if Advg_,4(A) is a negligible function of A for any PPT algorithm A. 

The non-interactive part of our assumption is similar to Assumption 3 in the 
literature [24], with elements v, v“, v® added (and elements from Gp, and G,, 
are samplable). Intuitively, this would not help much in deciding é(g, g)%* since 
the discrete logarithm between g and v is unknown. See Sect. 2.4 for details. 

Looking ahead, a will serve as the master secret msk in our scheme. The 
simulator does not know a only in the last transition (from an SF ciphertext to 
a random ciphertext). To simulate an SF key for RKA of f € P, we only need 
gf‘ We, which is “protected” by W2, a random Gp, element. Therefore, ® can 
include a large class of functions based on group operations in Zy. 


t Obviously, static assumptions are weaker than non-static counterparts. 


Don’t Tamper with Dual System Encryption 423 


2.3 Turning it into a Non-interactive Assumption 


Our -oracle DBDH assumption can be non-interactive for specific P. For exam- 
ple, if @ is the class of affine functions, g*X2 and v® in the ee instance 
suffice to derive (gf Wo, vf V2) for f € &, where W2, V2 € Gp, 


For ® being the class of polynomials with maximum degree a we can simply 


answer all queries if the problem instance also contains g% ,...,g° oF uw 


This assumption is similar to the d-extended decision bla Diffie. Hellman 
assumption*ř used for an existing polynomial-RKA secure IBE scheme [3]. For 8 
P(a) 
Qla)? 
of degree at most d, f(a) can be computed using group operations on a. 


being the class of rational functions f(a) := where P,Q are polynomials 


2.4 Generic Security from the Uber-Assumption Family 


Boyen [10] offered an exposition of Boneh, Boyen, and Goh’s “uber-assumption” 
family [7] for analyzing the validity and strength of pairing assumptions in the 
generic-group model (GGM). GGM was introduced by Shoup [28] to study 
generic-group algorithms that act independently of the group representation. 
In GGM, algorithms are given access to group elements via a randomly selected 
representation. In the abstract computation model of Maurer [25], the model 
this paper uses, an algorithm interacts with a black box via interfaces capturing 
the real-world atomic computations (e.g., group operation and pairing). The box 
only reports when any two computed elements are equal (i.e., “collide” ). 

We recall the following master theorem (for a product of three primes) [22]. 


Theorem 1. Let (N = pip2p3, G, Gr, ê) È G(1*). Let {A;} be random vari- 
ables over G, and let { B;}, To, T, be random variables over Gr, where all random 
variables have a degree at most t. Consider the below experiment in the GGM: 

Algorithm A is given N, {A;}, {Bi} and T, for a random bit b, and outputs b'. 
A’s advantage is the absolute value of the difference between Pr|b' = b] and 1/2. 

Suppose To and T, are independent of {B;}U{é(A;, A;)}. Given a PPT (in A) 
algorithm A performed at most q group operations and has advantage 6, we can 
output a non-trivial factor of N in PPT with a probability at least 6—O(q?t/2>). 


Appendix A will argue the generic security of our -oracle DBDH assumption 
when @ is the class of polynomial and rational functions. 


2.5 Beyond Group Operations 


Generalizing, one can consider versions of ® that contain bitwise operations, 
such as bitwise XOR with a random Gaussian white noise, bit-shifting, or bit- 
wise one-way permutation of a (cf., auxiliary input model in leakage-resilient 


5 This belongs to g-type assumptions, commonly known for more than a decade. 
6 A random variable expressed in this way has degree t if the maximum degree of any 
variable is t [22]. 
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cryptography [16]). The class of functions ® can also be a combination of any 
function described above, together with affine or polynomial functions. Of course, 
the class of functions @ should be chosen carefully such that for all f € ®, 
(gf We, vf( V2) should not help any PPT adversary decide if T = é(g,g)°*. 
However, it is more difficult to analyze the validity of the -oracle DBDH 
assumption if the class of functions ® involves bitwise operations. An obstacle is 
that these bitwise operations do not naturally match the group operations. One 
should be very careful about checking the validity of the assumption if bitwise 
operations are involved. Meanwhile, we note that RKA security against these 
kinds of attacks generally appears to require some novel techniques. For exam- 
ple, in the case of pseudorandom functions, the latest results for establishing 
RKA-security for XOR rely on post-zeroizing multilinear maps [1]. 


3 Security Model for (Hierarchical) ID-Based Encryption 


An IBE scheme consists of four PPT algorithms: 


— Setup: On input of a security parameter 1%, it generates a public system 
parameter (an implicit input of all other algorithms) that defines a message 
space M, a master public key mpk, and a master secret key msk. 

— Extract: On input of msk and an identity ID, it outputs an identity-based 
secret key skip. 

— Enc: On input of mpk, an identity ID, and a message M from the message 
space M, it outputs a ciphertext €. 

— Dec: On input of mpk, the identity-based secret key skip for the identity ID 
and €, it outputs a message M or L symbolizing decryption failure. 


HIBE is an IBE scheme where the identity ID can be a vector of strings 
idj,...,1d;, possibly with a bound H over i, with an extra delegation algorithm: 


— Delegate: On input of mpk, skig, ....,id;, and idj4i, it outputs skig,,... jd; .4- 


Confidentiality is modeled via an indistinguishability-based game against 
adaptive chosen-plaintext attacks (IND-ID-CPA) between the challenger and 
the adversary A. For saving space, we give a single definition capturing either 
@.-RKA security for IBE [2] or (Be, Pa)-RKA security for HIBE. The act of tam- 
pering or fault injection is applied to the secret used for deriving the output of 
the corresponding oracle. In particular, a tampered key extraction oracle should 
not be confused with getting keys for “fake” or “related” identities. 


1. Setup. The challenger runs (mpk, msk) — Setup(1*) and gives mpk to A. 
2. Phase 1. A can issue queries to the following oracle. 
— Extraction oracle €O(¢, ID): On input of a function ¢ € Pe and an identity 
ID, it returns a secret key skip — Extract(@(msk), ID). 
— Delegation oracle DO(¢, ID, = (idi,...,id;—1), id;): For the case of HIBE, 
on input of a function ¢ € @g, a parent identity ID,, and a child identity 
id;, it returns a secret key skip — Delegate(mpk, @(Extract(msk, ID,)), id;). 
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3. Challenge. A sends two messages Mj, Mj € M and an identity ID* to 
the challenger. The challenger picks a random bit b and computes ¢* — 
Enc(mpk, ID*, M% ). The challenger sends €* to A. 

4. Phase 2. A can keep issuing queries as in phase 1. 

5. Output. A returns a guess b* of b'. 


A wins the game if b! = b*. We require that the followings are true: 


1. There was no query to EO with input (¢,ID) such that (msk) = msk while 
ID equals to ID* for IBE, or ID is a prefix of ID* for HIBE; 

2. For DO that is only available for HIBE, there was no query (¢,IDp, id;) 
where ID, is a prefix of ID", such that @(skip,) allows a “trivial break,” e.g., 
Dec(mpk, Delegate(mpk, ¢(Extract(msk, ID,)), id;),€*) € {Mo, My}. 


The advantage of A is |Pr[A wins] — z|. If there is no PPT A with a non- 
negligible advantage in the game above, we say the IBE scheme is ,-RKA 
secure, or the HIBE scheme is (®.,®q)-RKA secure. 


Complications in HIBE. The definition of the class ®g is much more compli- 
cated than Se. The culprit is that the master secret key msk is often “less struc- 
tured.” To illustrate, msk in many pairing-based constructions is just an expo- 
nent “protected” by the discrete logarithm assumption, restricting the adversary 
from exploiting the output derived from a related key. Meanwhile, different from 
Extract, there can be many valid inputs to the Delegate algorithm even for the 
same identity, since identity-based secret keys are often generated probabilisti- 
cally, making it tricky to exclude all possible “trivial” RK As. For many schemes, 
they consist of group elements that can be considered as a randomized version 
of the parent identity-based secret key. We thus leave the definition of “trivial 
break” as a placeholder and define specific structural restrictions for our HIBE 
scheme. 


4 Security Against Related-Key Attack from Dual 
System Encryption 


4.1 Our RKA-Secure IBE Scheme 


Lewko and Waters [24] used DSE to lift the selective security of Boneh-Boyen 
IBE [6] and Boneh-Boyen-Goh HIBE [7] to the composite order setting to obtain 
adaptive security. At a high level, we make the following modification to the dual 
system IBE scheme of Lewko and Waters [24]. Let œ be the master secret. Both 
ciphertexts and user secret keys now include randomized copies of vf for some 
element vı of order pı. Related keys will then be randomized and hence do not 
help much in decrypting the challenge ciphertext. This is conceptually different 
from the collision-resistant identity renaming of the Bellare et al. [3] framework, 
which rules out the collision of two user secret keys for different identities when 
one originated from a related master secret key with the ¢ function applied. 
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Setup(1*): The private key generator (PKG) runs the bilinear group generator 
G(1*) to get (N = pip2p3,G,Gr, é) as defined in Sect. 2. Suppose G also gives 
generators gı and g3 of the subgroups Gp, and Gp, respectively. The PKG 
randomly picks a € Zy, hı, u1,v1 E€ Gp,. The public system parameter is 


(N,G, Gr, é, 91, hi, U1, V1, 93). 


The master public key is (ê(g1, g1)“, vf). The master secret key is a. 


Extract(msk, ID): The PKG randomly picks r € Zy, X3, X4 € Gp, and computes 
Kı = g? (hiu! Pvt)" X3, Kə = gi X}. 


As in existing RKA-secure IBE schemes [3], Kı is computed by a every time, 
and the PKG does not reuse gf or vf from past computations. 


Enc(mpk, ID, M): To encrypt a message M in the message space M = Gr for 
ID, the sender randomly picks s € Zy and outputs € = (Co, C1, C2) where 


Co=M-e(g1,91), Cr=gi, C2 = (mupo). 


Dec(mpk, skip, €): Given a ciphertext € = (Co, C1, C2) and a secret key skip = 
(Kı, Ko), the recipient outputs M= Co $ é(Co, K2)/é(Ci, Kı). 


Security. The main ingredients of DSE are semi-functional (SF) keys and SF 
ciphertexts. An SF key can decrypt a normal ciphertext, and a normal key can 
decrypt an SF ciphertext. However, an SF key cannot decrypt an SF ciphertext. 
The security proof goes by a sequence of transformations: from normal ciphertext 
to SF ciphertext, and then from normal keys to SF keys, one by one at a time. In 
the end, the adversary got a number of SF keys, which cannot help decrypt the 
SF challenge ciphertext. This allows the last transition from the SF challenge to 
an encryption of a random message, in which no adversary has any advantage. 

We define the following SF structures used only in the security proofs. They 
are like their normal version in the actual scheme but “perturbed” by a Gp, 
generator, denoted by either g2 or go below. 


An SF secret key (or just SF key) is in the form of (K| = Kı - 93, K} = K-92), 
where y € Zy, and (K4, K2) is a normal secret key. Naturally, y is random. 


An SF cipherteat is in the form of (Cj = Co, Ci = C1 - G2,C4 = Cz - 98), where 
ô € Zy and (Co, C1, C2) is a normal ciphertext. Likewise, 6 is also random. 

Decrypting an SF ciphertext by an SF secret key will result in a message 
“blinded” by €(92, 92)7~°. In case the exponents in these extra blinding factors 
are zeros, decryption still works, which leads us to the notion of nominally semi- 
functional (NSF) secret keys. An NSF secret key is a special kind of SF key that 
can decrypt some corresponding SF ciphertexts, which means y = ô. If an SF 
secret key is not nominally semi-functional, it is truly semi-functional. 


Theorem 2. Our IBE scheme is 6-RKA IND-ID-CPA secure under Assump- 
tions 1, 2, and the -oracle DBDH assumption. 
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The security proof is given in Appendix B. Its resemblance with the existing 
proof of Lewko and Waters [24] demonstrates our point that the DSE framework 
fits well with the RKA security. Here, we explain some of the intuitions for our 
security proof. Firstly, our scheme exploits different subgroups for building the 
SF objects. Like the usual DSE approach, we need to establish a nominally 
semi-functional secret key (which can always decrypt the corresponding semi- 
functional ciphertext) to avoid the paradox that the simulator can use the semi- 
functional ciphertext to check if a user secret key is also semi-functional. 

Nevertheless, to ensure this remains unnoticeable to the adversary, we want 
to avoid the collision of the function values governing the y and 6 factors modulo 
one of the prime factors. This can be resolved easily as in existing DSE-based 
proofs since one can factor the order of the composite group when such a collision 
is found. For our case, we also need to ensure the same holds true when the ¢ 
function is involved, which can be easily done with another similar hybrid. 

For an Extract(¢(msk), ID) oracle query, to avoid the adversary from winning 
trivially, @ cannot be an identity map, and ID cannot be the challenge identity. 
This restriction is exactly what we need for the above transition to go through 
in the security proof regarding the y and 6 factors. Also, there is no further 
restriction on the @ function, and hence we can support a wider class of ®. 

Our last transition (from an SF ciphertext to a random ciphertext) is the 
only transition where the simulator does not know the master secret key a, which 
we resort to the -oracle DBDH assumption to answer the related-key queries. 
This is the only place relying on this assumption in our entire security proof. 


4.2 Our RKA-Secure Hierarchical IBE Scheme 


Our system can be extended to Lewko-Waters HIBE [24]. A complication of 
allowing RKA attacks on the Delegate algorithm of HIBE is that the identity- 
based secret key of the parent may be composed of several group elements, and 
the adversary may apply different RKA functions to different group elements. 


Setup(1*): We let H denote the maximum depth of the HIBE. The PKG runs 
the bilinear group generator G(1*) to get (N = pi pop3, G, Gr, ê). Suppose G also 
gives generators gı and g3 of the subgroups Gp, and G,,, respectively. The PKG 
randomly picks a € Zy, u1,..., UH, hı E€ Gp,. The public system parameter is 


(N, G, Gr, ê, g1, h1, u1,...,UH, U1, 93)- 


The master public key is (ê(g1, g1)“, vf). The master secret key is a. 
Extract(msk,ID = (id1ı,...,id;)): The PKG randomly picks r € Zyn, X3, X3, 
X3.j41,---,43,H € Gp, and computes skip = (K1, Ko, Dj+1,.--, Dg), where 


a 
Ky = gf (huit -> u or) Xs, K2=g X3, {Di = uf Xsai}vic{j+1,.. H} 


Note that the PKG does not reuse gf and vf from past computations. 
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Delegate(mpk, Skid, ae idj idj41)): On input Skia; ,...,idj = (Kı, Ko, Dj41, glee Dy), 
it randomly picks r’ € Zyn, Y3, Y3, Y3,;+2,---, Y3,4 € Gp, and computes 
K, = Ky(h idi, iti titar puy Ko — Koot Y! 
1 Aanu uj ujiji UP)” Dji Ys, 2 = K291 Y3, 


= $ — $ 
Dj+2 = Dj+2U}4+2Y3,j+2, TE Dy = DyuyY3,n- 


Enc(mpk, ID = (idi,...,id;), M): To encrypt a message M € Gr for (idi,..., id;), 
the sender randomly picks s € Zy and outputs € = (Co, Ci, C2) where 


^ as s i idj a\s 
Co = M - ê(g, g1)", C1 = gi, Cz = (hyu -uy v7). 


Dec(mpk, skip, €): Given a ciphertext € = (Co, C1, C2) and a secret key skip = 
(Kı, K2,...), the recipient outputs M = Co - ê(C2, K2)/ê(C1, K1). 


Security. We define the following SF structures used in the security proofs only. 


An SF key for an identity ID = (idi,...,id;) is in the form of 
ki =K,-93, Ki =Ko- ge, Dg = Djy ag, oer Diy = Dy - G2", 


where 7, Yj41)---> Vg € Zn, and (K1, K2, Dj41,--., Dz) is a normal secret key. 


An SF cipherteat is in the form of (Cj) = Co, Ci = C1 - ĝ2, C = C2 - 98), where 
ô € Zy and (Co, C1, C2) is a normal ciphertext. Decrypting an SF ciphertext 
using an SF secret key will result in a message “blinded” by é(g2,92)7~*. An 
NSF secret key is a special kind of SF key that can decrypt the SF ciphertext, 
which means 7 = 6. Except for this key, other SF keys are truly semi-functional. 


Theorem 3. Our HIBE scheme is (Be, a)-RKA IND-ID-CPA secure under 
Assumptions 1, 2, and ®.-oracle DBDH assumption, where for all delegation 
oracle query = ($1, 92; Pi410++-> PH) E Pa applied to the identity-based secret 
key (Kı, K2, Dj4i,..., Dx) of an identity ID component-wise, we require that: 


1. p= Phr 

2. yı: G — G and y2 : G — G are isomorphic functions. 

3. IfID is a prefix of the challenge identity, for all g € G, the value log, 4) #1(9) 
(which is well defined given the isomorphic property) is randomly distributed. 


The security proof is given in Appendix B. An intuitive explanation for the 
first two restrictions is that not all combinations of group elements constitute a 
well-formed identity-based secret key. To illustrate the last one, the adversary 
might supply y2 as a trapdoor permutation (cf., auxiliary leakage, e.g., [16,31]), 
for which the adversary could recover the original term while there is no easy 
way for the security reduction to identify that. Without upgrading the underlying 
IBE schemes “too much” beyond the application of DSE techniques, we chose 
to enforce some random behavior for one of the component functions p2. 
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5 Extensions 


Bellare et al. [2] showed the relationship between different RKA-secure cryp- 
tosystems. It is well-known that IBE implies signatures. Bellare et al. [2] demon- 
strated that Naor’s transform preserves RKA security. Similarly, the trans- 
form of Boneh, Canetti, Halevi, and Katz [8] using one-time signatures (triv- 
ially implied by IBE) turns a -RKA secure IBE scheme into a ®-RKA secure 
public-key encryption scheme against chosen-ciphertext attacks (CCA). So, our 
-RKA secure IBE can be transformed into CCA-secure encryption and signa- 
ture schemes. 

Our (H)IBE schemes described in composite order groups are very similar to 
the existing ones [24], which have been translated into prime order groups. 

Recently, tightly-secure IBE schemes [21] have been constructed from the 
matrix Diffie-Hellman assumption. The security reduction switches the keys to 
semi-functional by adding some kernel matrix. The ciphertext is also switched 
to semi-functional similarly. The main difference from the DSE is that the game 
hopping is done in log(A) times, and hence the security reduction is tighter. We 
leave it as a future work to apply our approach to these IBE schemes. 


6 Conclusion and Future Works 


Existences of trapdoors can be hard to detect [14,15]. There are growing interests 
in protecting cryptographic systems from tampering attacks such as related-key 
attacks (RKA), which apply a function to modify the key before it is used. 
Existing works mostly propose solutions tailor-made for resilience against a spe- 
cific form of functions. We propose a design methodology that reduces the set 
of allowed functions to the underlying assumption, leading to an RKA-secure 
identity-based encryption (IBE) scheme and its hierarchical IBE extension. 

We hope our work can inspire follow-up in devising RK A-secure cryptosys- 
tems for a wide class of RKA functions instead of custom-made solutions for 
each kind of function. We also hope the rather direct correspondence between 
the real-world RKA security over a function family and the intractability of the 
related assumptions can stimulate more cryptanalysis. Other future directions 
include investigating the applicability of our methodology to related/generalized 
notions, such as security against related-randomness attacks [32], complete non- 
malleability [13], addressing non-trivial copy attacks in RKA security, possibly 
through the lens of non-malleable function [11], and investigating RKA-security 
for escrow-free IBE [12] (existing constructions include full-domain-hash and 
exponent-inversion IBE schemes [12], and a lattice-based one [17]). 


A Generic Security of Specific Cases of Our Assumption 


This section justifies the security of our ®-oracle DBDH assumption in the vari- 
ant abstraction of Maurer [25]. This variant GGM model internally stores tuples 
denoting group elements. We consider the order of the generic group N is a 
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composite of three distinct primes pip2p3, which Boyen [10] has discussed a 
few caveats or justifications. The model internally stores two types of tuples, 
(1, H2, u3) and [v1,V2,V3], where pui, vi € [L, pi], to represent an element in the 
base group and the target group, respectively. This computational model pro- 
vides two types of operations, add and mul, which adds and multiplies the tuples, 
representing the group operation and the pairing, respectively. 

The -oracle DBDH problem can be formalized as one to distinguish two 
black-box accesses B and B’ of the same type but with a different distribution 
of the initial state. Specifically, for B, the model stores tuples 


(, 0, 0), (0, 1, 0), (0, 0, 1), (a, x, 0), (s, Y, 0), (v, 0, 0), (va, 0, 0), (vas, 0, 0), [as, 0, 0l, 


and for B’, the model stores tuples 


(1, 0, 0), (0, 1, 0), (0, 0, 1), (a, T, 0), (s, Y, 0), (v, 0, 0), (va, 0, 0), (vas, 0, 0), [t1, ta, t3]. 


The computational model offers an additional oracle ©, which takes a poly- 
nomial f; of degree at most d from ® as input and stores (f;(a@), w;,1,0) and 
(v fila), Wiz, 0) in its state, where w;,1, wi,2 are sampled uniformly from [1, pə]. 
The adversary can only make at most q add, mul, or O queries to the model. 

If no collision occurs in both B and B’, the views of the adversary are trivially 
identical. Next, we bound the collision probability. In the former case of accessing 
B, the collision occurs in either the base group or the target group. 

For the base group, it is obvious that the collision probability is bounded by 
q?/p (based on an existing analysis [25]), where p is the minimal of (p1, p2, p3). 

For the target group, when a collision occurs, the values in all three positions 
must be identical, which means the values in the second position (mod p2) must 
be identical. Note that any value in the second position is in the form of 


Co + Cx + Coy + C3Wi 1 + C4Wi,2 
+ dy ory + dı 3£wWi 1 + dy 4vwi,2 + d2 3YWi 1 + d2,4YWi 2 + d34Wi1Wi2, 
i.e., a multivariate polynomial of variables (x, y,wi,1, W1,2,.--,Wg,1,Wq,2), and 
their degree is bounded by 2. Therefore, using a lemma due to Schwartz [27], no 
collision occurs, except a probability of 2q?/p. 


Conditioned on that no collision on the value in the second position, we have 
any value in the first position being in the form of 


Co + Cy + C25 + €3U + C4Ua + Cavas + cov a + crva? + cgvas + Covas 


q 
+ cova? 3? +Y afila ED vfila +Z dwafila +Z evaso) 


i=1 


+ Yana J+ he? fio j+ Wdafi(a Sa (a) + hias, 
i=1 i=1 


which is a multivariate polynomial of (a,s,v), and the degree is bounded by 


d+ 1. Similarly [27], no collision occurs except for probability “NW +a)+20" 
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It is also evident that the collision probability in the latter case (accessing B’) 
is bounded by the former one (the black-box access to B). Thus, the advantage 
of our assumption above is bounded by (arr tate" which is negligible. 
The non-interactive version of the assumption can be analyzed in this GGM 
similarly. Applying an analogous analysis (which we skip due to the space limit), 
the function family ® can be easily extended to rational functions, as long as 
the degrees of their denominators are also bounded. This stems from an idea of 
Boyen [10], which notationally replaces a rational exponent with a polynomial 
multiplied with the (non-zero) least common multiple of all denominators. 


B Security Proofs 


B.1 Proof of Theorem 2 


Proof. We prove by a hybrid argument using a sequence of games. The first 
game Gameyeq! is the real P-RKA IND-ID-CPA game. We denote the challenge 
identity to be ID*. The second game Game;es is the same as Gamereai, except 
that the adversary cannot ask for the secret key of identity ID = ID* mod po. 
This restriction will be retained throughout the subsequent games. Let q be the 
number of extraction oracle queries. For k = 0 to q, we define Game, as: 


Game;: It is the same as Game,,.,, except that the challenge ciphertext is semi- 
functional (SF), and the keys used to answer first k oracle queries are SF. The 
keys for the rest of the queries are normal. 

As a result, in Gameg, all keys are normal and the challenge ciphertext is SF. 
In Gamez, all keys and the challenge ciphertext are SF. We defer to the lemmas 
below to prove the indistinguishability between these games. 

The last game is Gamefinai, which is the same as Game, except that the 
challenge ciphertext is a semi-functional ciphertext encrypting a random message 
instead of one of the two challenge messages. In Gamefinai, the value of b’ is 
information-theoretically hidden from A. Hence A has no advantage in winning 
Gamefinai- We will prove below that if Assumptions 1, 2, and the -oracle DBDH 
assumption hold, then Gamereq; is indistinguishable from Gamefnal. 


Lemma 1. We can construct an algorithm B with a non-negligible advantage 
in breaking Assumption 1 or Assumption 2 if there exists an adversary A such 
that Adv ,(Gamerea) — Adv 4(Gameyes) = €. 


The proof of Lemma 1 is easy (e.g., see [24,30]) and is omitted. 


Lemma 2. We can construct an algorithm B with advantage e in breaking 
Assumption 1 if there exists A such that Adv 4(Gameres) — Adv 4(Gameo) = €. 


Proof. Given (g, X3,T) from Assumption 1, B can simulate Game,es or Gamep. 
B chooses random a, b,c,a € Zy, hi € Gp. B sets g1 = g, hi = 9% ,u1 = 9°, 11 
9°, 93 = X3. B generates the rest of mpk according to Setup and sets msk = a. 
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For the RKA-extraction oracle queries (¢, ID), 6 returns Extract(¢(msk), ID). 
Note that B can check if ¢(a@) = a using the knowledge of a. 

In the challenge phase, A sends B two messages Mj, MY, and an identity ID*. 
B randomly picks a bit b’ € {0,1}. B calculates the challenge ciphertext as: 


Cò a My . êlT, gı)“, CÌ = T, C3 = Tat+olD* +ca 


If T = gê, this is a normal ciphertext, and hence B simulates Game,,,. If T = 
g°Y2, this is an SF ciphertext with g. = Y2,9§ = Yt?’ +; and B simulates 
Gamep with 6 = a+ bID* + ca. The values of a, b, c, a mod po are not correlated 
with the corresponding values modulo pı by the Chinese remainder theorem. If 


A can distinguish between Game,,, and Gamep, 6 can break Assumption 1. 


Lemma 3. We can construct an algorithm B which breaks Assumption 2 with 
advantage € if there exists A such that Adv 4(Gamee_1) — Adv 4(Gameg) = €. 


Proof. Given (g,X1X2,X3,Y2Y3,T) from Assumption 2, B can simulate 
Gameyp_; or Gamez. B chooses random a, b,c,a € Zy, sets g1 = g, hi = g%, u1 
g’, vı = g°, and g3 = X3, and generates the rest of mpk and msk = a according 
to Setup. 

For the k-th distinct RKA-extraction oracle query on ID, and ¢x, B can 
compute x(a) and check if (a) = a using the knowledge of a. 


— If k < £, B returns Extract(¢;(msk), ID;,). 
- If k > £, B calculates (Ky, K2) — Extract(ġp(msk), ID) using msk. B ran- 
domly picks 71, %2 € Zy and returns the (related) SF-key: 


Ki = Ky: (Ya¥3)", Kg = K2- (Y2¥3)”. 


This is semi-functional. By the Chinese remainder theorem, the values of 
1,72 modulo pz and modulo p3 are not correlated. 
— If k = £, B chooses random X3, X} € Gp, and returns the (related) key: 


is git . Teth IDe+e-be(a) -X$, K= T. X}. 


If T = 2123 € Gpp, where Z; € Gp,; it is a normal key with g” = Z. 
Hence 6 simulates Gamey_;. If T = 212223 € G, it is an SF key with 
gJ = Zee) and g2 = Z2. Hence B simulates Gamez. Note that the 


value of y mod pg is not correlated with the values of a, b, c, and a modulo py. 


In the challenge phase, A sends 8 two messages Mj, MY, and an identity ID*. 
B chooses a random bit b’ € {0,1} and calculates the challenge ciphertext: 


Co = My : é(X1X2,91)%, Cy = (X1 X92), C3 = COPS) eDiena 


It is an SF ciphertext with ĝo = X and g§ = X2+»D*+ca, Tf the &th SF key is 
created for decrypting the challenge ciphertext, i.e., IDg = ID* and ¢¢(a) = a, 
its y factor becomes 6 = a+b-ID* + c - a, so it is a nominally semi-functional 
key which will always decrypt the challenge ciphertext. 

Finally, we have to consider the view of the adversary in the Gamer. The 
value of 6 = a + b - ID* + c- a mod pə is uncorrelated to y = a + b- IDe + c- Qela) 
since a,b,c, œ are only known in modulo p; and: 
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— Case 1: ID* Æ IDy. Then a + b- ID, is uncorrelated’ to a + b- ID* modulo po. 
It implies y is uncorrelated to ô since a,b, c, œ are randomly chosen from Zy. 

— Case 2: ID* = ID; and ¢:(a) # a. Then a + c- ela) is uncorrelated®. It 
implies y is uncorrelated to ô since a,b,c, a are randomly chosen from Zy. 


By definition, the adversary query with ID* = ID, and ¢(a) = a. 
So, B can break Assumption 2 if A can distinguish Gameg_,; and Gamez. 


Lemma 4. Given an adversary A such that Adv 4(Game,) — Adv 4(Gamefinal) = 
c, we can construct an algorithm B with advantage € in breaking the ®-oracle DBDH 
assumption. 


Proof. Given (g,g%X2, X3, g°Y2, Z2,v,v%,u™,T) and accesses to an oracle O 
from the -oracle DBDH assumption, 6 chooses random a,b € Zy and sets 
n=9, h=9%, m=, vi=v, @(91,91)* = êlg, g% Xa). 


B implicitly sets msk = a. B sends the master public key mpk to A. 
B can calculate the semi-functional secret key as follows. B randomly picks 
r € Zy, Ro, R3 E€ Gps, and R3, R} E€ G,,, and returns: 


Ki = (9° Xə) - (hiupot)”: Ro- R3, K} = g" - R,- R}, 


If it is a related key query with input ¢, then B asks O(¢) for obtaining the 
related key (g? Wa, vf MY). B can answer all extraction oracle queries by: 


Ki = (gO We.) - (hu -v$ Va)". Ro- Rs, Ki =9"- R,- Ri, 
Note that B can check if ¢(a) = a by checking if 2 m u is in the subgroup Gp, 
but not Gp, and G,,. This is easily doable using g € Gp, and X3 € Gps- 
Finally, B picks a random bit b and calculates the SF challenge ciphertext: 


c$ = Mě Wg OA = (g°Y2), C} = gies gee. 


If T = é(g,g)**, B simulates Game,; Gamefing otherwise. If A can distinguish 
between these two, 6 can break the -oracle DBDH assumption. 


B.2 Proof of Theorem 3 


Proof. We prove by a hybrid argument using a sequence of games. The first 
game Game;eqi is the real (®., ®y)-RKA IND-ID-CPA game, and we denote the 
challenge identity to be ID* = (id},...,idj.). 

The second game Game;-¢, is the same as Gameyeq!, except that the adversary 
cannot ask for keys for identities which are prefixes of ID“ modulo p2, for both 


7 The case that ID* Æ ID; and ID* = ID; mod pə is eliminated by Game;es. 
8 The case that de(a) # a and ¢¢(a) = a mod pz can be eliminated by an extra game 
similar to Game;es considering a+ c- a modulo p2. We omit the repetitive details. 
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extraction oracle EO and delegation oracle DO. This restriction will be retained 
throughout the subsequent games. After that, we use q to denote the number of 
distinct ID queries to EO and DO. For k = 0 to q, we define Game; as: 


Game;,: It is the same as Game,-,, except that the challenge ciphertext is SF, 
and the keys used to answer first k oracle queries are SF. The keys for the rest 
of the queries are normal. 

As a result, in Gamep, all keys are normal and the challenge ciphertext is 
SF. In Game,, all keys and the challenge ciphertext are SF. 

The last game is Gamefinai, which is the same as Game, except that the 
challenge ciphertext is an SF encryption of a random message. 

The following lemmas prove the indistinguishability between these games. 


Lemma 5. When given an adversary A with Adv ,4(Game;eqi)— Adv 4 (Gameres) 
=e, we can construct an algorithm B with a non-negligible advantage in breaking 
Assumptions 1 or 2. 


The proof of Lemma 5 is easy and is omitted. 


Lemma 6. We can construct an algorithm B with advantage e in breaking 
Assumption 1 if there exists A such that Adv 4(Gameres) — Adu4(Gameo) = €. 


Proof. Given (g, X3,T) from Assumption 1, B can simulate Game,,, or Gameo 
with A. B uses the bilinear group context from the assumption for the public 
system parameters, and chooses random a, b;,...,b7,c,a@ € Zn, hi E€ Gp. B 
sets gı = g, hı = g%,u, = g"',...,ugq = g?¥, v1 = 9°, 93 = X3. B generates the 
rest of mpk according to Setup and sets msk = a. 

For the RKA-extraction oracle queries (¢, ID), 6 returns Extract(@(msk), ID). 
Note that B can check if ¢(a@) = a using the knowledge of a. 

In the challenge phase, A sends B two messages Mj, M7, and an identity 
ID* = (idj,...,id5.). B picks a random bit b’ and derives the challenge ciphertext: 


Oğ = Mg eT, g)*, C] =T, O3 = TA bid tem, 


If T = gê, this is a normal ciphertext, and hence 6B simulates Game,;e,. If 
5 en hd i 5 i a6 a+ > 27_, biidf +ca 

T = g°Y>, this is an SF ciphertext with gz = Yo,g5 = Y ~*" ; and 

hence B simulates Gameg with 6 = Sie bid; +-ca. By the Chinese remainder 

theorem, the values of a, b1,...,bj,c, œa mod po are not correlated with the cor- 

responding values modulo pı. Therefore, if A can distinguish between Game,¢, 

and Gameọ, B can break Assumption 1 with the same probability. 


Lemma 7. We can construct an algorithm B with advantage e in breaking 
Assumption 2 if there exists A such that Adu ,4(Gameg_;) — Adv 4 (Games) = €. 


Proof. Given (g,X1X2,X3,Y2Y3,T) from Assumption 2, B can simulate 
Gamey_; or Game; with A. B chooses random a,bj,...,b4,c,a@ E€ Zy. Like 
in the proof of the last lemma, B sets gı g, hı g, uy g” 


po) UH = 
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g’#,v, = 9°, and g3 = X3. B generates the rest of mpk according to Setup and 
sets msk = a. 

For the k-th distinct RK A-extraction oracle query on ID; = (id;,...,id;) and 
ox, B can check if $,(a) = a by the knowledge of a. 


— If k < £, B returns Extract(¢;,(msk), ID;). 
- If k > £, B derives (K1, K2, Dj+1,.--, Dg) — Extract(¢,(msk),ID,) by msk. 
B randomly picks 71,72, %41- - -> Yg € Zy and returns the (related) SF key: 


K; = Ki (Y2Ys)", K} = Ko-(Yo¥3), {D; = Di: (YoY) Wie ttt...) 


This is semi-functional. By the Chinese remainder theorem, the values of 
Y1: V2: V41- -> Yg Modulo pz and modulo pz are not correlated. 

— If k = £, B chooses random X3, X3, X3 j+1,---, X3,H € Gp, and returns the 
(related) key: 


Ky = gO TEL biidi+etela) X3, K, = TXY, {Dj = T™ X3, iẹvie{j+1,. H} 


If T = 2123 € Gpp, where Z; € Gp; it is a normal key with g” = Z. 
Hence B simulates Gamey_}. If T = ZıZ2Z3 € G, it is an SF key with 


i bid, , , 

Go = Zo, g} = Zo Him biiditeelo) giti ZY... ga" = = 73". Hence B 
simulates Gamez. Again, note that the values of Vs Üy .. +577 mod pz are 
not correlated with the values of a,b1,...,b#,c and a modulo pj. 


For the k-th distinct RKA-delegation query on ID; = (idi,...,idj—1,id;) and 
Ok = (y1, p2, Phs ESI py): 


- if k < £, B calculates (Ky, K2, D;,..., Dg) — Extract(msk, (idi,...,id;—1)). 
B returns Delegate(mpk, (p1 (K1), p2(K2), 9) (Dj), -- -Pu (Da)), idz). 

— if k > £, B calculates skip, as above. Denote skip, = (Kı, Ko, Dj4i,..., Du). 
B randomly picks 71,772, Yj41,---» 7H © Zw and returns the (related) SF key: 


Ki = Ki (VY), Kh = Ko-(Y2¥3)%, {Di = Dj-(Y2¥3)7 Vie G41, ae H} 
— if k = £, B picks X4, X7, X3,;,...,X3,4 € Gp, and returns the (related) key: 
Kıi=g:> PE bidea -X3, Ky =T- XJ, {Di = T™ - Xz i}vic{j,.. H} 


B returns Delegate(mpk, (p1 (K1), p2(K2), 9) (Dj), -- -Pu (Dr)), idz). 
If T = Z1 Z3 € Gpp where Z; € Gp,; it is a normal key with g” = Z1. Hence 
B simulates Gamez_,. If T = Z1 Z2Z3 € G, it is a related SF key with J2 = 


p2(Z2) due to the isomorphic property of P2, J = p(z ai } patite . 


p (Z b), qn = ve), and 92" = gy(Z?"). Hence B simulates 


Gamey. Again, note that the values of 7,7;41,---, Yg mod pz are not corre- 
lated with the values of a, b1,...,b#,c, and a modulo py. 
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A sends B two messages Mg, Mj and an identity ID“ = (idj,...,idj.) in the 
challenge phase. B picks a random bit b and derives the challenge ciphertext: 


Co = My - €(X1X2,91)%, CY =(X1X2), Cp = (X1 Xo) At Dic bidi tea, 


It is an SF ciphertext with g. = Xz and ĝf = xettian vidita Recall that the 
y factor for the 4-th SF key will be equal to 6 for the same identity vector and 
when e(a) is an identity function (7.e., a key that can decrypt the challenge 
ciphertext), so it is a nominally semi-functional key that will always decrypt the 
challenge ciphertext. If the 4th oracle query is for the extraction oracle, the value 
of 6 = a+}? bid +c-a mod pə is uncorrelated to y = a+} 2 biidi +c: de(a) 
since a,b,,...,bx,c,a@ are only known in modulo pı and: 


— Case 1: ID, is not a prefix of ID*. There exists some i € [1,7*] such that 
id} Æ id;. Then a+ b; id; is uncorrelated? to a + b; id} modulo pə. It implies 
y is uncorrelated to 6 since a and b; are randomly chosen from Zy. 

— Case 2: ID; is a prefix of ID* and ġe(a) 4 a. Then a+c-ġe(a) is uncorrelated’, 
and y is uncorrelated to 6 since a,c are random elements of Zy. 


By the definition of the security model, the adversary cannot ask for any extrac- 
tion oracle query with ID* = ID; and ¢¢(a) = a. 
If the éth oracle query is for delegation, since y; = Lis and y is isomorphic, 


3 = pa (Zz MOY Oh (ZI) = py (Zp rian ten), 

— Case 1: If IDọ is not a prefix of ID*, it is also uncorrelated to the value of 
98 _ xtX i= biidž +c-a 
the extraction oracle. 


— Case 2: If ID; is a prefix of ID*, we have §2 = p2(Z2). Hence 


, due to a distribution analysis similar to the case of 


j 
y=(a+ 5 biid; + ca) - log... (zs) y1(Z2). 
i=1 


7 is correctly distributed as log,,,(z,) ¢1(Z2) is randomly distributed in Zy. 


So, B can break Assumption 2 if A can distinguish Gameg_,; and Gamez. 


Lemma 8. Given an adversary A such that Adv 4(Game,) — Adv 4(Gamefinal) = 
c, we can construct an algorithm B with advantage € in breaking the Pe-oracle DBDH 
assumption. 


9 The case that id; Æ id; and id; = id; mod pə is eliminated by Game,,;. 
10 The case that ¢e(a) # a and ¢e(a) = a mod pz can be eliminated by an extra game 
similar to Game;es considering a + c- a modulo p2. We omit the repetitive details. 
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Proof. Given (g,g%X2, X3, g°Y2, Zo, v,v%,u™,T) and accesses to an oracle O 
from the ®,-oracle DBDH assumption, 6 chooses random a,b € Zy and sets 


n=9, bh=9%, m=, vi=v, @(91,91)* = êlg, g% Xa). 
B implicitly sets msk = a. B sends the master public key mpk to A. 
To compute the semi-functional secret key, 5B randomly picks r € Zy and 
Ra, Rs, R5, R}, Ra j41; R3j41; Preg Rə g R3,H = Gps; then returns: 


a i idj ayr r 
Ky = (9° Xə) - (iu? -us 0f)" - ReR3, Ko = g" - RR}, 


Tr Ka 
Diti = ujpa Roy Bs j+ +s Dy = wy: R2,g Ps,H- 


If it is an RKA-delegation oracle query with input ġa = (1, P2, Ph; ---; PH) 
B returns skip, — Delegate(mpk, (y1 (K1), p2(K2), gj (D5),--+, 7 (Du)) idz). 

If it is a related key query with input e, then B asks O(¢é.) and obtains 
(go Wo, vey), B returns 


Kı = (g9 W3) . (hyul? yee (Myr Rone. Kac g” i R, . R}. 


Therefore, 6B can answer all extraction oracle queries. Note that 6 can check if 
Qela) = a by checking if gte Wo/(g*X2) is in the subgroup Gp, but not Gp, 
and G,,. This is easily doable with the help of g € Gp, and X3 € Gps- 

Finally, B picks a random bit b and computes the SF challenge ciphertext: 


Co = My T, Cy = (g°Y2), Cy = (g°Y2) TD" . ups 


If T = ê(g, g)**, B simulates Game,. Otherwise, 6 simulates Game€finai. If A can 
distinguish, B can break the ®.-oracle DBDH assumption. 
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Abstract. Digital signatures are widely deployed to authenticate the 
source of incoming information, or to certify data integrity. Common sig- 
nature verification procedures return a decision (accept/reject) only at the 
very end of the execution. If interrupted prematurely, however, the verifi- 
cation process cannot infer any meaningful information about the validity 
of the given signature. We notice that this limitation is due to the algo- 
rithm design solely, and it is not inherent to signature verification. 

In this work, we provide a formal framework to handle interruptions 
during signature verification. In addition, we propose a generic way to 
devise alternative verification procedures that progressively build confi- 
dence on the final decision. Our transformation builds on a simple but 
powerful intuition and applies to a wide range of existing schemes consid- 
ered to be post-quantum secure including the NIST finalist Rainbow. 

While the primary motivation of progressive verification is to mitigate 
unexpected interruptions, we show that verifiers can leverage it in two 
innovative ways. First, progressive verification can be used to intentionally 
adjust the soundness of the verification process. Second, progressive veri- 
fications output by our transformation can be split into a computationally 
intensive offline set-up (run once) and an efficient online verification that 
is progressive. 


Keywords: Digital Signatures - Amortized Efficiency - Flexible 
Verification - Progressive Verification - Post-Quantum Security 


1 Introduction 


Digital signatures allow one party (the signer) to use her secret key to authenti- 
cate a message in such a way that, at any later point in time, anyone holding the 
corresponding public key (the verifiers) can check its validity. The typical nature 
of signature verification procedures is monolithic: the validity of a signature is 
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determined only after a sequence of tests is completed. In particular, if the execu- 
tion is interrupted in media res (Latin for “in the midst of things” ), no conclusive 
answer can be drawn from the outcomes of the partial tests. Although this mono- 
lithic nature is not a burden in many application scenarios, e.g., validating finan- 
cial transactions (Bitcoin protocol), installing certified software updates (Android 
OS), or delivering e-services (e-Health, electronic tax systems), it is a major lim- 
itation to the adoption of digital signatures in cyber-physical systems [24] and in 
secure eager or speculative executions [19], where the speed at which verification 
is performed plays a crucial role. 

Le et al. [18] proposed to address unexpected interruptions using a new cryp- 
tographic primitive called signatures with flexible verification. In a nutshell, such 
schemes admit a verification algorithm that increasingly builds confidence on the 
validity of the signature while it performs more steps. In this way, at the moment 
of an interrupt, the verifier is left with a value a € [0,1] U L that probabilis- 
tically quantifies the validity of the signature, or rejects it. While the primary 
motivation of flexible verifications is to mitigate unexpected interruptions; we 
observe that the overarching idea of progressive verification has further impacts. 
In particular, progressive verification can be used to customize the soundness 
of the verification process. For example, a smart device may decide to verify at 
a 30-bit security level, if the signatures come from specific sources or the bat- 
tery is below 30%. From the theoretical perspective, progressive verification (as 
introduced in this work later on) draws interesting connections between classical, 
information-theoretic and post-quantum security notions. 


1.1 Our Contribution 


This work sets out to dismantle the monolithic nature of signature verification 
by designing new verification methods for existing signature schemes. Concretely, 
we investigate two approaches. The first one is to speed-up the verification pro- 
cess for polynomially many signatures by the same signer leveraging a one-time 
computation on the public key (efficient verification). The second approach is 
to re-design the verification process so that it allows one to extract sensible 
information even when the algorithm is executed only partially (progressive ver- 
ification). In this setting it is of particular interest to investigate the security 
implications of this new model and what additional features it may bring. 

In detail, we introduce formal definitions and security models for both effi- 
cient (Sect.2) and progressive (Sect. 3) verification. In terms of realizations, we 
focus on a specific family of schemes that we call with Mv-style verification 
(in brief, the verification includes matrix-vector multiplications). For schemes in 
this class, we propose two compilers, i.e., two information-theoretic transforma- 
tions that turn monolithic Mv-style verifications into provably-secure efficient 
(Sect. 4.1), or progressive (Sect. 4.2) ones. Our compilers apply to multi-variate 
polynomials based schemes including the NIST finalist Rainbow [10,11] and 
LUOV [4]; and lattice-based schemes including GPV [15] (hash & sign), MP 
[20] (Boyen/BonsaiTree), and GVW [16] (homomorphic). A large part of the 
security proof is devoted to a detailed analysis of the leakage due to verification 
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queries (that now involve secret randomness). We consider this leakage analysis 
a result of independent interest as it can be used to estimate leakage in similar 
information-theoretic approaches to provably secure algorithmic speed-ups or 
eager executions. Our models for efficient and progressive verification can eas- 
ily be extended to include signatures with advanced properties including: ring, 
threshold, homomorphic multi-key, attribute-based and constrained. 


1.2 Related Work 


The problem of trading security for less computation during a verification has 
been considered first by Fischlin [13] and Armknecht et al. [1] in the context of 
message authentication codes (MACs). Le et al. [18] and by Taleb and Vergnaud 
[23] consider the same question for digital signatures. 

Le et al. [18] introduce the notion of flexible signatures and a construction 
based on the Lamport-Diffie one-time signature [17] with Merkle trees. Taleb and 
Vergnaud [23] put forth realizations of progressive verification for three specific 
signature schemes (RSA, ECDSA and GPV). Differently from us, both works 
demand a modification of the signing or key generation algorithm of the original 
signature scheme and also a time variable be input to the progressive or flexible 
verification. 

One main difference between our model and those of [13, 18, 23] is that we aim 
to capture progressive verification as an independent feature that can enhance 
existing schemes, rather than a standalone primitive that requires one to change 
some of the core algorithms of a signature scheme. This is in a way more challeng- 
ing as it leaves less design freedom when crafting these algorithms. In addition, 
we define progressive verification as a stateful algorithm in contrast to stateless 
(13, 18, 23]: although this makes our model slightly more involved, it is compara- 
bly more general and can capture more (existing) schemes. 

Our model for efficient verification is close the offline-online paradigm used 
in homomorphic authentication [2,9] and verifiable computation [14]; where a 
preprocessing is done with respect to a function f, and its result can be used to 
verify computation results involving the same f. An early instantiation of this 
technique for speeding up the verification of Rabin-Williams signatures appears 
in [3]. More recently, Sipasseuth et al. [22] investigate how to speed up lattice- 
based signature verification while reducing the memory (storage) requirements. 
The overall idea in [22] is similar to ours (and inspired to Freivalds’ Algorithm): 
to replace the inefficient matrix multiplication in the verification with a prob- 
abilistic check via an inner product computation. However, [22] focuses on the 
DRS signature [21], and investigates the trade-off between pre-computation time 
for verification and memory storage for this scheme only. Moreover, the work 
lacks a formal, abstract analysis of the security impact of such a shift in the ver- 
ification procedure. In contrast, we devise a general framework to model ‘more 
efficient’ and ‘partial’ signature verification. Albeit we developed our approach 
independently of [22], our techniques can be seen as a generalization of what 
presented in [22]. 
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Notation. In what follows, A denotes the parameter for computational security 
and X = (KeyGen, Sign, Ver) a tuple of algorithms identifying a digital signature 
scheme that satisfies the syntax and the properties of correctness and existential 
unforgeability as defined in [23]. 


2 Efficient Verification for Digital Signatures 


The core idea of efficient signature verification is to split the verification process 
into two steps. The first step is a one-time and signature-independent setup 
called ‘offline verification’. Its purpose is to produce randomness to derive a 
(short, secret) verification key svk from the signer’s public key pk. Note that 
the offline verification does not change the signature, which remains publicly 
verifiable; instead it ‘randomizes’ pk to obtain a concise verification key svk that 
essentially enables one to verify signatures with (almost) the same precision as 
the standard verification, but in a more efficient way. We remark that for secure 
efficient verification svk should be hidden to the adversary, yet, the knowledge of 
svk gives no advantage in forging signatures verified in the standard way using 
just pk. The second verification step consists of an ‘online verification’ procedure. 
It takes as input svk and can verify an unbounded number of message-signature 
pairs performing significantly less computation than the standard verification 
algorithm. For security, it is fundamental svk remains unknown to the adversary. 
We remark that generating svk during the offline phase achieves efficient online 
verification with no impact on the original signing or key generation algorithms, 
which was a drawback of previous work [18,23]. 


2.1 Syntax for Efficient Verification 


Our definition of efficient verification lets the verifier set the confidence level k 
at which she wishes to carry out the signature verification. Notably k determines 
the amount of computation to be performed and thus plays a central role in the 
security and the efficiency of the new verification. 


Definition 1 (Efficient Verification). A signature scheme X admits efficient 
verification if there exist two PPT algorithms (offVer, onVer) with the following 
syntax: 


offVer(pk, k): this is a randomized algorithm that on input a public verification 
key pk, and a positive integer k € {1,...,A} (where A is the security parameter 
of X), returns a secret verification key svk. 

onVer(svk, 44,0): on input a secret verification key svk, a message u, and a sig- 
nature o, the efficient online verification algorithm outputs 0 (reject) or 1 
(accept). 


For convenience we will refer to the signature scheme augmented with the effi- 
cient verification algorithms as XE = (X, offVer, onVer), and to the integer value 
k as confidence level. 


1 Here pk denotes a public verification key output by KeyGen. 
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To be meaningful, a realization of efficient verification needs to satisfy the 
properties of correctness, concrete atomized efficiency and security. 


Definition 2 (Correctness of Efficient Verification). A scheme XË = 
(37, offVer, onVer) realizes efficient verification correctly if the following condi- 
tions hold. For a given security parameter A, for any honestly generated key 
pair (sk, pk) — KeyGen(A), for any message u , for any signature o such that 
Ver(pk, 1,0) = 1, and for any confidence level k € {1,...,A}; it holds that 
Pr[onVer(svk, u, o) = 1 | svk — offVer(pk,&)] = 1 for any choice of randomness 
used in offVer. 


Amortized efficiency relies on the fact that running offVer once and reuse its 
output to run onVer r times is computationally less demanding than running 
the standard verification Ver r times. To formalize this, we will use the function 
cost(-) that given as input an algorithm returns its computational cost (in some 
desired computational model). In addition, we parameterize concrete amortized 
efficiency with two intertwined variables: ro (number of instances of verification), 
and eọ (ratio between the cost of ro efficient verifications over ro standard verifi- 
cations). The lower the value of rọ the sooner X? amortizes the computational 
cost of offVer. The lower the value of eg the more efficient XË is with respect to 
the standard verification. 


Definition 3 (Concrete Amortized Efficiency). A scheme XF realizes 
(ro, €0)-concrete amortized efficient verification for X if given a security param- 
eter A and a confidence level k; for any key pair (sk, pk) — KeyGen(A), for 
any pair (u,o) with u E M and o such that Ver(pk, u,0) = 1; there exist a 
non-negative integer ro, and a real constant O < eo < 1 such that: 


cost (offVer(pk, k)) + r - cost(onVer(svk, u, o )) 


Vr>r, 
= r- cost(Ver(pk, l, o)) 


< € (1) 


2.2 Security Model for Efficient Verification 


Intuitively, XF realizes efficient verification in a secure way if onVer accepts a 
signature that would be rejected by Ver only with negligible probability. In the 
security game (see Fig. 1), the adversary A has access to the signing oracle OSign 
as well as the efficient verification oracle OonVer. The goal of the adversary is to 
produce a signature o* for a message u* that was never queried to OSign and 
for which Ver returns 0 (reject) and onVer returns 1 (accept). 


Progressive and Efficient Verification for Digital Signatures 445 


cmvEUF 
cmvEUF (A, X, k) Expo (A, k) 
1: Lse@ 1: (u*,o*) — cmvEUF(), X, k) 
2: ifweLl 
2: (pk,sk) — KeyGen(1*) me 2 
3: return 0 
3: svk — offVer(pk, k) . Van 
* * OSign,OonVer 4: if Ver(pk, ML ,o ) =1 
oe PE (pk, k) 5i return 0 
5: return (u“,o*) 6: be onVer(svk, u*,o") 
: 7: return b 
OSign,, (12) 
1: Lge LsU {u} OonVerwk( u, o) 
= E 1: b< onVer(svk, u, o) 
3: return o 
2: return b 


Fig. 1. Security model for efficient verification of signatures: existential unforgeability 
under adaptive chosen message and verification attack (security game, experiment and 
oracles). A is a PPT algorithm that can query the oracles in an adaptive and parallel 
way. Ls is the list of messages queried to the signing oracle. 


Definition 4 (Security of Efficient Verification). A scheme ©” realizes a 
secure efficient verification for X if for a given security parameter X and for any 
confidence level k € {1,...,A}, for all PPT adversaries A the success probability 
in the cmvEUF experiment reported in Fig. 1 is negligible, i.e.: Adu seF (A, k) = 


Pr ExpsrseF (A, k) =1] < e(A, k). 


Line 5 of the cmvEUF experiment excludes forgeries against the original signature 
scheme. This is justified by the correctness of efficient verification and by the 
fact that X is existentially unforgeable. Notably, both the security game and 
the advantage depend on the confidence level k and assume all algorithms are 
entirely executed. 


3 Progressive Verification for Digital Signatures 


The goal of progressive verification is to incrementally increase the confidence on 
the validity of a signature, for a given message against a public key. Intuitively, 
the “confidence” should be proportional to the amount of computation invested: 
the further in the execution we go, the higher the accuracy of the decision, and 
thus the confidence of the final outcome (accept/reject). 
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3.1 Signatures with Progressive Verification 


Taleb and Vergnaud give a very intuitive definition of progressive verification 
for digital signatures [23]. They model digital signatures with progressive veri- 
fication as a 4-tuple of PPT algorithms (KeyGen, Sign, Ver, ProgVer) such that: 
X = (KeyGen, Sign, Ver) is a correct digital signature scheme; and ProgVer takes 
in input a public verification key pk, a message u, a signature g, and some tim- 
ing parameter t, and outputs a € {[0, 1] O R} U {L}, interpreted as an estimate 
on the accuracy of its decision whether the signature be valid. Moreover, the 
scheme satisfies the following properties: 

Correctness If for some tuple of inputs ProgVer(pk, u,o0,¢t) outputs L, then 
Ver(pk, 4,0) = 0. 

Security If for some tuple of inputs ProgVer(pk, u,0,t) outputs a € [0,1], 
then Pr[Ver(pk, ju, 7) = 0] < 1 — a (where the probability is taken over the ran- 
dom coins of ProgVer). 

In a nutshell, if a = L, the progressive verification deems the signature to be 
invalid (with 100% accuracy). If a € [0,1], the algorithm considers the signature 
valid, and a tells how accurate this statement is. Since progressive verification 
may be interrupted at any arbitrary point t during its execution, in practice a is 
(the output of) a function Aprog(t) that “converts” the progress in the verification 
process into a value representing the accuracy of a positive outcome. 


Shortcomings. First, similarly to [18], also [23] sees signatures with progressive 
verification as a stand alone primitive. In contrast we view progressive verifica- 
tion as a feature that can augment existing schemes without requiring change to 
the core algorithms. Second, the definition lacks a precise notion of time com- 
plexity and does not model how unexpected interrupts are handled. The model 
we introduce in the remainder of this section takes care of these aspects. In addi- 
tion, we generalize progressive verification to be (possibly) stateful, which can 
capture more signature schemes as well as reuse the same syntax to model both 
efficient and progressive verification (details in the full version [6]). 


3.2 Syntax for Progressive Verification 


In order to model progressive verification as an add-on algorithm we need to 
derive from Ver an alternative algorithm ProgVer (as introduced in Sect. 3.1), 
that builds confidence on the final verification outcome in an increasing way. 
Without loss of generality, this task boils down to identifying a sequence of 
T + 1 atomic instructions that we call ProgStep with the following properties. 
Each ProgStep performs a check of some sort on the input it receives. If one step 
fails, the progressive verification returns a = L. If none of the initial t steps fails, 
the progressive verification returns the output of a function Qprog(t) € [0,1] that 
measures the probability the input will be accepted by Ver. The fact of increas- 
ingly building confidence is reflected by functions prog that are non-decreasing 
in t, the number of instructions checked before returning the answer. Figure 1 in 
[23] provides an intuitive and graphical representation of this statement. 
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Definition 5 (Stateful Progressive Verification). Let T € Z>o and prog : 
{0,..., T} — [0,1] be an efficiently computable function. A signature scheme X 
admits (T, Aprog)-progressive verification if there exists a stateful PPT algorithm 
ProgVer that takes in input pk, u,o and some interruption parameter t € Zso, 
outputs a € {[0,1] OR} U {L}, and satisfies the following syntax: 


ProgVer(st, pk, pu, o, t) 4: for j7=0,...,¢ 
i aasi (b, st) — ProgStep, (st, pk, u, 7) 
2: ift<0: return L if (b=0): return L 
3: ift>T: set t—T 7: else (b= 1): a — Aprog(J) 
8: return a 


a 


a 


For convenience we will refer to the signature scheme augmented with progressive 
verification as XP = (X, ProgVer, T, prog) 


Concretely, ProgVer is made of T + 1 algorithms ProgStep,, for j = 0 to T, that 
progressively update the state st. We remark that the formalization into steps is 
without loss of generality: Ver realizes a trivial progressive verification for T = 0 
where the only step is Ver itself. Finally, the interruption value t is input to 
ProgVer only, and it is not given to each ProgStep;. Thus our syntax models the 
fact that the steps are agnostic of the interruption value and must work without 
knowing when to stop, which is essential to capture arbitrary interruptions. 

Correctness essentially states that signatures accepted by the standard verifi- 
cation should also be accepted by the progressive one, with the highest confidence 
allowed by the number of steps performed. 


Definition 6 (Progressive Verification Correctness). Let X? be a signa- 
ture scheme with progressive verification; ProgVer satisfies progressive verifica- 
tion correctness if, for any value t € {0,...,T}, for any given security param- 
eter A, for any key pair (sk, pk) — KeyGen(A), for any admissible state st gen- 
erated by ProgVer, for any admissible message, given a signature o such that 
Ver(pk, js, 7) = 1 it holds that: Pr[ProgVer(st, pk, ,0,t) = Qprog(t)] = 1. 


Efficient vs. Progressive Verification. At a first glance, efficient verifica- 
tion and progressive verification seem to have the common goal of reducing the 
computational cost of a signature verification. However the way this objective 
is achieved in the two models is quite different. In progressive verification, the 
verifier (and thus each ProgVer,) is unaware of when the computation will be 
interrupted, and its execution is independent of t. In contrast, in efficient ver- 
ification the verifier (running offVer) determines the confidence level k prior to 
any actual verification (running onVer). In the latter, the (online) verification 
is aware of the confidence level k (seen as interruption value), and adapts its 
execution to k. 


Stateful vs Stateless Verification. We define progressive verification as state- 
ful. This allows us to keep the framework as general as possible. Stateless progres- 
sive verification, á la [18,23], can be obtained setting st to Ø, this also removes 
the need for analyzing any cross-query leakage due to state reuse. 
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3.3 Security Model for Progressive Verification 


Our notion of unforgeability states that signatures rejected by the standard 
verification should also be rejected by the progressive one, except for an inac- 
curacy factor due to interruptions. More formally, Ver and ProgVer should have 
the same behavior (accept/reject) with discrepancies happening with probability 
negligibly close to Aprog(t). 

Our security game has three main differences compared to [18]: 

State in order to take into account that ProgVer maintains a possibly non- 
trivial state we allow the adversary A to interact with the progressive verification 
oracle OProgVer during the query phase, as well as the signing oracle OSign, in 
a concurrent manner. 

Interruption queries to OProgVer have the form (,0,t’), where t is the 
desired interruption value submitted by A (and chosen adaptively). 

Output instead of a single bit, our experiment returns a pair (b,t*). The 
bit b € {0,1} flags the absence or the potential presence of a forgery, while 
t* € {0,...,T} reports the interruption position used in the final progressive 
verification. Including ¢* in the output of the experiment allows us to measure 
security in terms of how close the probability of A wining the experiment is from 
the expected accuracy value 1 — prog (t*). 


progEUF(5”’, A) Expy p (A) 
1: Lg g 1: (p*,o*,t') — progEUF(%, A) 
2: ste Ø 2: (8 <Ver(pk, u*,o*) 
3: (pk,sk) — KeyGen(1*) 3: € — Olnt(t’) 
4: (u, a*t) — AOS OPN er ok yy 4i a ProgVer(st, pk, ", 0", t*) 
5: return (uon t) 5: ifpu*cLsVa=lVv8=l1 
6: return (0,t*) 
7: return (1,t*) 
OSign,, (14) OProgVer.. (Ho; t’) 
1: Ls LsU{pu} 1: te Olnt(t’) 
2: o < Sign(sk, p) 2: a ProgVer(st, pk, u, o, t) 
3: return o 3: return a 


Fig. 2. Security model for progressive verification of signatures: existential unforgeabil- 
ity under adaptive chosen message and progressive verification attack (security game, 
experiment and oracles). A can query the oracles adaptively, in parallel and polynomi- 
ally many times in X. Ls is the list of messages queried to the signing oracle. 


Definition 7 (Security of Progressive Verification (progEUF)). Let X be 
a signature scheme that admits a progressive verification realization XP. XP 
realizes a secure progressive verification for X if for any given security parameter 
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A, for all PPT adversaries A the success probability in the progEUF experiment 
in Fig. 2 is negligible, i.e.,: 

Adu? Ep (A) =Pr [Eprs F(a) = (1,t*)| — (1 — aprog(t*)) = € < e(à). 
Intuitively, Definition 7 states that an adversary has only negligible probabil- 
ity to make ProgVer output a confidence value a* higher than the expected 
one. Let bad(t) denote the probability of accepting a forgery after t verifi- 
cation steps. Then by setting Qprog(t) = 1 — bad(t), we get Adu F(A) = 


AXP 
Pr[ Expres" (2) = (1,t*)| — bad(t*) < e()). 


Modelling Interruptions. In [18], unexpected interruptions are modeled via 
an interruption oracle iOracle(A) that returns a value t € {0,...,7} used by 
the progressive verification. However, it is not clear whether A may control 
iOracle or not. We overcome these ambiguities by letting A output t with every 
progressive verification query. For the purpose of this work, we consider the 
strongest security model in which the interruption oracle returns the adversary’s 
value, i.e., t — Olnt(t’) with t = t’. This resembles side-channel attack settings, 
where A may try to freeze the execution of the verification. It is possible to 
relax and generalize our model by setting a different interruption oracle Olnt, 
programmed at the beginning of the game. At each verification query, Olnt takes 
as input the adversary’s suggestion for an interruption position tł and outputs 
the value t to be used by the progressive verification. In case t = t, we are 
modelling side channel attacks, but we can also let t be independent of t’. A 
realistic definition of Olnt is outside the scope of this work. 


4 Constructions 


In this section, we present generic transformations (compilers) that augment a 
signature scheme X with either efficient (Sect. 4.1) or progressive verification 
(Sect. 4.2). 

Our technique works for a specific class of signature schemes that we call 
with Mv-style verification. In such schemes, Ver can be seen as the combination 
of two types of verification checks: a matrix-vector multiplication (referred to 
as Mv = 0, for appropriate matrix M and vector v) and other generic checks 
(collected in the Check subroutine), see Fig.3 for details and an explanatory 
example. Among the schemes with Mv-style verification we highlight some of the 
seminal lattice-based signatures [7,8, 15,20], homomorphic signatures [5, 12, 16], 
and multivariate signatures [4,11]. 


4.1 A Compiler for Efficient Mv-Style Verifications 


We present a generic way to realize efficient verification for signatures with Mv- 
style verification, whenever the computational complexity of Ver is dominated 
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Example: Ver(pk, p, o) for GPV08 [15] 
Ver(pk, p, o) 


1: bı + 0,b2 +0 

2: parse pk = (PK, PK.aux) 
set PKA 
set PK.aux + (H, B) 

3:  Check(PK.aux, u, o) : 
if |lo||/<8 set bı + 1 

4:  GetMv(pk, u, o): 
set M © [A] —Lowsca)] 
set u+ H(u) E tala 


// INITIALIZE ACCEPTANCE BITS 

13 bı — 0, bo +0 

// SPLIT pk INTO MARTIX - AUX. DATA 
2: parse pk = (PK, PK.aux) 

// ADDITIONAL VERIFICATION CHECKS 
3: bı + Check(PK.aux, u, o) 

// FORMATTING Mv-STYLE CHECK 

4: (M,v) < GetMv(pk, u, o) 


// MATRIX-VECTOR MULT. CHECK a par 
5: if (M-v=0) set ve [øo |u ] 
6: bo + 1 if (M eve O,-ows(A) x1 mod q) 


6: set bo< 1 
return (bı A b2) 


7: return (bı A b2) 


Fig. 3. General structure of a signature with Mv-style verification (on the left); an 
instructive example: the GPV08 [15] signature verification (on the right). 


by the matrix-vector multiplication, i.e., cost(Check) << cost(Mv) ~ mn field 
multiplications (for M € Z)*™). 

Our compiler for efficient verification is detailed in Fig.4 with a sketch of 
instantiation for the LBS scheme GPV08 [15] as a running example. Further 
details on this scheme as well as instantiations and details on the concrete effi- 
ciency estimates for MP12 [20], Rainbow [11] and LUOV [4], are deferred to the 
full version [6]. Table 1 summarizes the efficiency results. We obtain secure effi- 
cient online verification using as little as 0.4% (resp. 50%) of the computational 
cost of the standard verification for lattice-based signatures on exponentially 
large fields (resp. for Rainbow). 


Overview of Our Technique. Our transformation takes as input X, a signa- 
ture scheme with Mv-style verification; and it returns ©” = (X, offVer, onVer) 
that securely instantiates efficient verification for X. The heart of our compiler 
leverages the fact that for any pair of vectors ø and u (often derived from the 
message p), and for any matrix A (of opportune dimensions) if A -o = u then 
for any random vector c (of opportune dimension) it holds that c- (A-o) =c-u. 
Collecting variables on the left hand yields (c-[A|—In]) - H = 0. Thus one 
can precompute the vector z + c- [A] —I,] and run the efficient online verifi- 
cation check z- v +0, where v — (ø, u). In a nutshell the idea is to replace the 
matrix-vector multiplication with a vector-vector multiplication in a sound way. 
Correctness and efficiency are immediate. Soundness essentially comes from the 
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offVer(pk, k) onVer(svk, u, o) 
1: parse pk = (PK, PK.aux) // LIGHTWEIGHT CHECKS 
// ¢g., in GPV08 PK = A, PK.aux = (H, 8) 1: if Check(PK.aux, u, o) = 0 
2: M + GetM(PK) // e.g., in GPV08 this is ||o|| < 8 
// ¢g., in GPV08 M = (AJ — Inxn) 2: return 0 
3: if (k > rows(M)V k <1) return L // FORMATTING FOR EFF. VER 
// GENERATE RANDOMIZED KEY 3: (Z',v) + GetZV(svk, u, o) 
4: Z4 GetZ(M, k) 4: parse Z' = [z{"|...|z, |" 
i: Zo 4+ Oixcols(M) // for good indexing // Ze gene) 
ii: for j=1,...,k 5: parse v =[vi|...|ve]” 
see $ Zixrews(M) j 
m: ci q [fv e zyret 
a M Zi xcols(M) 
Iv: z< cM EZ // LINE-BY-LINE INNER PRODUCTS 
Vv: if z € (Zo,...,%j;-1)q go to iii. 6: forj=1,...,k 
vi: Zj <-—Z // store new lin.indep. vect. rý if z -Vj F 0 mod q 
vii: set Z + [zi |...|z{]" € | 8: return 0 
5: return svk + (k, Z, PK.aux) 9 return 1 
(a) The offline verification algorithm. (b) The online verification algorithm. 


Fig. 4. Our compiler for efficient verification of signatures with Mv-style verification. 
The four scheme-dependent subroutines are: parse pk and GetZ (in offVer); Check and 
GetZV (in onVer). The computational complexity of onVer is linear in k, the chosen 
confidence level. 


fact that if z -v = 0, then with all but negligible probability the original system 
of linear equations A - o = u is satisfied too, as proven in Theorem 1. 


Security Analysis. Despite the construction being intuitive, analysing the leak- 
age due to verification queries that reuse the same svk is not trivial and is one 
main technical contribution of this result. 


Theorem 1. Let X be an existentially unforgeable signature scheme with Mv- 
style verification (as in Fig. 3). The scheme XE = (7, offVer,onVer) obtained 
via our compiler depicted in Fig. 4 is existentially unforgeable under adaptive 
chosen message and efficient verification attacks. Concretely, the advantage is 
AdvGi" A (A, k) < zE where k € {1,...,rk(M)} denotes the chosen confi- 
dence level that grows up to the rank of the matriz M, qy = poly(A) << q* is 
a bound on the total number of verification queries and q is the modulo of the 
algebraic structure on which X is built. 


Remark. For simplicity, Theorem 1 considers only existential unforgeability. 
The statement and the proof actually adapt with ease to other security models 
such as strong and selective unforgeability. 
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Table 1. A summary of the concrete efficiency achieved by various instatiations of our 
compiler for efficient verification. In the table, ko denotes the minimum accuracy level 
that ralizes efficient verification with 128 bits of security, i.e., for which Pr[Bad] < 27178 
is negligible (cf. proof of Theorem 1, with qy = 230); ro is the smallest positive integer 
for which &*0fVer(pkko))trcost(onVen) < 1, and eg is a (tight) upperbound on this ratio. 


r-cost(Ver) 
Ring or field size Min. accuracy Concrete amortized Online efficiency 
(representative level for 128-bit efficiency eee = Fa. 
schemes) security (see Definition 3) 
exponential: q = 2!°8 ko = 1 (ro = 2, eo = 0.51) zs < 0.4% 
FMNP [12]; GVW [16] 
large poly.: q = 2°° ko =5 (ro = 6, eo = 0.86) z5 < 2% 
Boyen[7];GPV[15];MP[20] 
small poly.: q = 16 ko = 32 (ro = 65, eo = 0.99) 32 = 50% 


Rainbow [11] F,4-(32, 32, 32) 


Proof. Let Win be the event {Exp FVF (A, k) = 1}. Let i = 1 to qy be the index 
of the queries (ui, oi) submitted by A to the OonVer oracle. Define the family 
of events bad; (for i = 1 to qy + 1) as: 


bad; := {Ver(pk, wi, oi) = 0 A onVer(svk, pi, oi) = 1} 


where bad,,41 corresponds to A returning a valid forgery (u*,o*) := 
(LMqy+1,%qv+1) at the end of the experiment. We can rewrite the winning con- 
dition of the security experiment as Win = {badg,+1 A p* ¢ Ls}. Consider 
the event Bad defined as “there exists at least one query index 7 in the game 
execution for which bad; occurs”. It is clear that 


Adu qe OF (A, k) = Pr[Win A Bad] + Pr[Win A —Bad] 
< Pr[Bad] + Pr[Win | —Bad] 
where the inequality comes from applying the definition of conditional probabil- 
ity and upperbounding Pr[Win | Bad] and Pr[—Bad] by 1. 


We notice that Pr[Win | —Bad] is essentially the probability that the event 
bad; occurs only for i = qy + 1 and never before, i.e., 


qv 
Pr[Win | —=Bad] < Pr bse | \ “ts 
i=l 
In order to bound Pr[Bad], we define events Bad; (for i = 1 to qy) as “bad; 
occurs for the first time at query i”, namely Bad; = bad; A C= 1 bad; 4) . Then 


we have 


qv qv i-1 
r[Bad] = lV Bad? ] = 5° Pr[Bad}] < XC Pr | bad;| À bad, 
i=1 i=1 j=1 
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where the second equality holds because the events Bad; are all disjoint, and the 
inequality follows from applying the definition of conditional probability and 


upperbounding Pr i= “bad; | by 1, for all 7. Thus: 


qvt1 


Ady (A, k) > Pr | bad; | A bad; | . (2 


nN 


Lemma 1. For every i = 1 to qy +1, it holds that 


i—1 
Pr [bad; =1| jx} nbad;] a oa 
The proof of Lemma 1 is deferred momentarily to let us complete the reason- 
ing that proves the theorem. Using the inequality provided by oe 1, it is 


easy to see that S74¥" Pr [bad; = =| KS 1 badj i| < yaar =p ry - Indeed, 


7 eran y Í 7 ae for all integers i in [1,qy + 1] and for all ae € N satis- 
fying qv < q®. Thus ite 7 G 5 < a, which proves the bound on the 
advantage. 

Proof of Lemma 1. The goal of this proof is to give a generic structure for 
estimating the leakage of infromation due to reuse of svk (i.e., probabilities in 
Equation (2)); due to space constraints details appear only in [6]. 


To upperbound Pr [bad; = =1| Ni F —bad;| we need to analyze the infor- 


mation leakage due to verification queries. First of all, by correctness 
onVer(svk, uio) = 0 = Ver(pk,u;,0;) = 0 and Ver(pk,u;,0;) = 1 > 
onVer(svk, ui, oi) = 1 for every possible svk generated by offVer from pk. Leak- 
age about svk happens in two cases: when an event bad; occurs (OonVer accepts 
where the standard verification would reject); and when OonVer rejects a query 
(here A may learn that some combination of rows of pk must appear in svk). 
Equation (2) gives us a way to bound the adversary’s advantage (and thus, the 
magnitude of this leakage) in terms of the events bad; and —bad;. 

Consider the i-th query (u;i, oi) to OonVer. If the oracle returns 0, the adver- 
sary learns that C -(M;-v;) #0 mod q. In other words, there is at least one 
row of CEC :={CeE Zs : rk(C) = k}, say cj, that is not in the hyperplane 
orthogonal to w; := M; - vj, i.e., cj : w; #4 0 mod q. Note that A knows w; 
since (M;,v;) can be computed from the pk, u; and oj. Let us introduce the 
sets H; C C of full-rank matrices C € C whose rows are all orthogonal to wi, 
formally: 


Hii=\4¥CeC: C= |.. A cj-wy»=0 modqVj=1l,...,k 
Ck 


We assume A be able to pick the vectors w; € Zọ ~ {0} of her choosing (e.g., 
by generating suitable pairs (u;,0;)). This assumption is generous as it gives 
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the adversary a large amount of power and freedom in the game. The restriction 
wı Æ 0 is technical, as otherwise Ver(pk, 41,01) = 0, which is a necessary 
condition for OonVer leaking information about svk. 

At the first verification query (441,01), A has no information about C beyond 
the fact that it was uniformly sampled from the set C := {C € Zn : rk(C) = 
k}. Therefore, for any choice of w1 Æ 0, if the event bad, occurs, then bad, = {C- 
w; =0 mod q AC ÈC}, thus Pr[bady] = Pr [C -w1 =0 mod gq A CC] = 
rat The first (rejected) verification query leaks the fact that C € C\H1. 

For the second verification query, without loss of generality let w2 be linearly 
independent from wy, i.e., w2 É (w1),. In this case, we have 


Pr[badə | =badı] = Pri{C-w2=0 modq|C4C A Ce (C\H)| 
Pr[C-w2=0 modg A CC A Ce (C\Mi)] 
Pr[C £C A Ce (C\M:)] 
< Pr[C-we=0 modg A C 2C] 
~  Pr[(C&e a Ce (C\H)| 


[Ho] 
__fl IHl 
eval = jeva: 


where the inequality follows from the fact that, given three events E,, Eo, Es, 
it always holds that Pr[E, A E A Es] < min{Pr|E A £4], Pri[Ei A 
Es], Pr|E2 A E3]}; and the last equality follows since the hyperplanes Hı and 
Hə have the same dimension. 

The same reasoning applies to the generic i-th verification query, where, 
w.l.o.g., A chooses w; outside the space generated by the previous w,’s, i.e., w; É 
(W1,.--,Wi-1)q- At such query, A knows that C € C\ (Us 1H; i): Analogously 


as before we get that 


7 re l Pr [C-w; =0 mod gq A C C] 

Pr cam bad; =o Pr[C ec\ (Ujz 1) A cc] 
oo m o 
KU] 


The proof concludes using the results in [6] and showing that 
ley (Ujit) | = Pal (eG -@-1)) Yis.. g+. 


Substituting this value into Equation (3) returns: 


i—l1 
Pr[bad; = 1 | /\ ~bad;] < 


j=1 ` Togk=™ 
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where the last bound follows from the chain: 


n—1 n—k 


q >q = 


>1, 


asl<k<nandq>1. 


4.2 A Compiler for Progressive Mv-Style Verification 


Our compiler for progressive verification builds on the result presented in 
Sect. 4.1. Given a signature scheme X with Mv-style verification, we define the 
T steps of a progressive verification X? for X as shown in Fig. 5. 

The value T sets the upper bound on the number of linear constraints 
the verifier wants to check, hence T = rows(M), where M is the matrix 
employed in the original signature verification of X. The set of admissible 
states S includes Ø and any possible state output by some ProgVer,, specifi- 
cally S = {0, 1} x Zy7wsF)xcols(Z) „ prows(v)xcols(v) x 69 144 U Ø, We extract 
the confidence level from the probability of a progressive forgery (as motivated 
by the proof of security given in Theorem 1). It is easy to see that the probability 
that an adversary creates a progressive forgery for an interruption step t is at 
most ri, this follows from the same reasoning as in the proof of Theorem 
1 for efficient verification. Concretely, the bound is derived from [6], where we 
only consider Pr[badı] as svk is refreshed with every new efficient verification 
query, and so there is no useful cross-query leakage, and we replace the confi- 
dence level k of the efficient verification with the interruption parameter t. If the 
size of the underlying algebraic structure is q = 2?°Y) this probability is negli- 
gible already for t = 1. In other words, for signatures with Mv-style verification 


ProgStep, (st, pk, u, o) ProgStep,,(st, pk, u, o) 
1 svk + offVer(pk, T) 1: b0 
2 parse svk = (T, Z, PK.aux) 2: parse st = (Z’, v) 
3: b<- Check(PK.aux, u, o) 3: if Z’[i,*]-v[*,7] =0 mod q 
4: st + GetZV (svk, u, o) 4: return (b + 1,st) 
5: return (b,st) 5: return (b + 0, st) 
&prog : {0,.--, T} — [0, 1], Oprog(t) = (1 — a) 


Fig. 5. Our compiler for progressive verification of signatures with Mv-style verifica- 
tion. The algorithms offVer, Check and GetZV are precisely as defined in Fig. 4, and 
T = rows(M). The notation Z’[i, x] describes the i-th row of the matrix Z’, similarly 
v[*, i] describes the i-th column of v (which is usually a vector v, but may be a matrix 
in some constructions). 
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defined on exponentially large algebraic structures efficient verification and pro- 
gressive verification coincide, trivially. The interesting case is q = poly(A), as 
the adversary could create a progressive forgery with non-negligible probability. 
We remark that in this section we are not targeting efficiency, and our instan- 
tiations of progressive verification refresh the svk produced by offVer at every 
verification query. This way, A cannot exploit the information possibly leaked 
by a progressive forgery in future forgery attempts. 


Theorem 2. Let X be an existentially unforgeable signature scheme with Mv- 
style verification (as of Fig. 3). Then the scheme XP obtained via our compiler 
(in Fig. 5) is a secure realization of progressive verification for X. 


Proof. Following Definition 7, we can realize secure progressive verification by 
setting Oprog(t) = 1 — Pr [Exp F(A) =<, i) + e(d) for all t = 0,..., T. The 
core part of the proof is to estimate this probability. 

Recall that our compiler for efficient Mv-style verification (in Fig.5) runs 
offVer at every verification query (line 1 in ProgVer,). This means that every 
verification query is answered using a freshly generated svk. In particular, the 
final verification (line 4 in the Exp ee (A) in Fig.2) checks A’s output using 
independent randomness from the previous queries. So, whatever information 
the adversary may have collected from previous queries is useless to win the 
experiment. As a consequence, the probability that the adversary wins the game 
equals the probability that the adversary outputs a valid forgery without query- 
ing OProgVer. The latter is precisely the probability of the event bad, defined 
in the proof of Theorem 1, where now we consider the matrix C to have t* rows 


instead of k. Hence from Lemma 1 it follows that Pr [Expe (a) < (1,¢)| = 


ge and: Advl{°§5¥F (A) = Pr [Exp EO) = (1,t")] - (1 = apolt) < z 
1 = 
(1- (a-4)) =0. 


4.3 Combining Progressive and Efficient Verification 


We observe that progressive verifications obtained with our transformation can 
be split into two parts: a one-time, computationally intensive, setup (ProgStep,); 
and an efficient online verification (ProgStep, to ProgStep;). This gives rise 
to custom (intentionally adjustable) verification soundness, which from the 
application perspective makes post-quantum secure verification accessible to a 
larger range of devices, and from the theoretical perspective draws interesting 
connections between classical, information-theoretic and post-quantum security 
notions. We include a more detailed discussion on this in the full version [6]. 
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Abstract. Attribute-based Signatures (ABS) allow users to obtain 
attributes from issuing authorities, and sign messages whilst simulta- 
neously proving compliance of their attributes with a verification policy. 
ABS demands that both the signer and the set of attributes used to 
satisfy a policy remain hidden to the verifier. Hierarchical ABS (HABS) 
supporting roots of trust and delegation were recently proposed to alle- 
viate scalability issues in centralised ABS schemes. 

An important yet challenging property for privacy-preserving ABS is 
revocation, which may be applied to signers or some of the attributes 
they possess. Existing ABS schemes lack efficient revocation of either 
signers or their attributes, relying on generic costly proofs. Moreover, in 
HABS there is a further need to support revocation of authorities on the 
delegation paths, which is not provided by existing HABS constructions. 

This paper proposes a direct HABS scheme with a Verifier-Local Revo- 
cation (VLR) property. We extend the original HABS security model to 
address revocation and develop a new attribute delegation technique with 
appropriate VLR mechanism for HABS, which also implies the first ABS 
scheme to support VLR. Moreover, our scheme supports inner-product 
signing policies, offering a wider class of attribute relations than previ- 
ous HABS schemes, and is the first to be based on lattices, which are 
thought to offer post-quantum security. 


Keywords: Attribute-based Signatures - Revocation - Delegation - 
Lattices 


1 Introduction 


(Hierarchical) Attribute-Based Signatures. To provide privacy-preserving 
authentication, Attribute-based Signatures (ABS), introduced in [25,32], allow 
users to collect attributes from authorities and produce signatures showing 
attribute-compliance with some signing policy. A core security property of ABS 
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schemes is that they are attribute-hiding, and for schemes that consider multiple 
users, it is often required that they also remain anonymous. A second security 
property, unforgeability, prevents users from generating signatures for policies 
for which they do not have a satisfying set of attributes. 

Most constructions for ABS schemes [14,15,17,34,35] are based on bilinear 
groups and make use of the flexible Groth-Sahai proof system [21] to provide 
anonymity guarantees. Notable exceptions include constructions from RSA [22] 
and recent work in the lattice setting [16,38,40,41], which are in the random 
oracle model. Originally, ABS schemes were proposed in the centralised model, 
that is, one central authority is responsible for all attribute issuance, but to allow 
for larger scalability, decentralised schemes [15] have also been developed. 

More recently, Hierarchical Attribute-Based Signatures (HABS) [13,18,19] 
overcome the shortcomings of previous schemes by allowing attribute delegation 
to intermediate authorities. In particular, a central Root Authority (RA) dele- 
gates issuing rights of a subset of attributes to lower tier Intermediate Authorities 
(IA) who can delegate further, or issue directly to a user. This overcomes the 
bottleneck of requiring a single authority to issue all attributes in a scheme with 
either a large number of users or attributes, and also allows a verifier to trust a 
signature without having to trust each authority in the scheme, as is the case in 
decentralised constructions. 


Revocation. A desirable property of any privacy-preserving signature is the 
support for user revocation. This would enable a trusted authority to prevent 
users from producing signatures that pass verification, without compromising 
the anonymity of honest participants. Revocation for a hierarchical structure 
of authorities would require the ability to check that a revoked authority does 
not appear anywhere in the delegation path of an attribute. This brings new 
challenges and any HABS construction would have to perform these additional 
checks when verifying the HABS signature. Specific to attribute-based proto- 
cols, it may also be desirable to revoke an attribute itself, rather than issuing 
authorities. For example, this maybe be required in the setting where attributes 
may depend on the time period or can be changed dynamically. 

Revocation techniques typically follow one of few approaches. Firstly, it can 
be achieved by requiring signers to update their secret credentials in order to 
produce a valid signature. Another approach is to use a public revocation list, 
which is updated with some information about revoked users. When a signa- 
ture is formed, the signer typically proves in zero-knowledge that its information 
does not appear in the list. Finally, we have verifier-local revocation which puts 
the onus on the verifier to check that signatures have not been generated by a 
revoked signer. This approach still requires up-to-date revocation information 
but has more semblance to traditional public key infrastructure that typically 
use Certificate Revocation Lists, and can allow for more efficient constructions 
as it bypasses the need for costly zero-knowledge proofs when generating signa- 
tures. Previously, VLR as a means of revocation has appeared in group signa- 
tures (introduced in [7]) but it remains an open problem for an ABS scheme to 
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support any revocation technique’. We note that Herranz [22] proposed a scheme 
called Revocable Attribute-based Signatures, however revocation here refers to 
the revocation of anonymity. 


Contribution. In this paper we improve upon security and functionality of 
existing HABS constructions by proposing a lattice-based scheme which sup- 
ports revocation and a wider range of signing policies. Our scheme is based on 
the widely used LWE and SIS assumptions over integer lattices, and supports 
inner-product relations which allow for conjunctive, disjunctive and threshold 
policies as well as polynomial evaluations of attributes [23]. Revocation in our 
HABS schemes uses a novel VLR mechanism to revoke signers and attributes 
as well as intermediate authorities. We model HABS security and use an inte- 
gration of techniques from identity-based encryption, trapdoor delegation and 
signature schemes as well as novel techniques to realise our construction. This 
work also implies the first lattice-based (non-hierarchical) ABS scheme with the 
aforementioned properties. 


Related Work. In this section we review related works on VLR, lattice-based 
signatures and signing policies in ABS schemes. 


Revocation. VLR. was first suggested in [3] and formalised in [6] and has been 
widely researched since then, for example, improving efficiency (e.g. [42]), func- 
tionality (e.g. [12]), stronger security properties (e.g. [8]) or basing on different 
hardness assumptions (e.g. lattices [24], bilinear groups [42]). The first scheme 
secure in the standard model that supported VLR was a group signature scheme 
by Libert and Vergnaud [26], based on the DLIN and variants of Diffie-Hellman 
type assumptions. In the recent lattice-based VLR group signature scheme from 
Langlois et al. [24], signing requires knowledge of a secret revocation token. We 
note that this technique cannot be transferred to the HABS setting as a signa- 
ture must also include tokens for intermediate authorities, which are part of the 
secret, thus a new approach is needed. 


Post-Quantum Security. Most ABS schemes are based on bilinear groups [14, 15, 
17, 34,35], or RSA [22] and do not offer post-quantum security. As some lattice- 
based hardness assumptions are believed to be resistant to quantum adversaries, 
this area has attracted significant research interest. As a result, there have been 
many privacy-preserving signature schemes, such as group signature schemes 
(e.g. [24,28-30]), ring signatures (e.g. [5,11]), anonymous attribute tokens [9] 
and even ABS schemes (e.g. [16,38,40,41]). However, whilst ABS have been 
proposed from lattices, current literature falls short of the delegation offered by 
HABS. 


Signing Policies in (H)ABS. Constructing schemes with more expressive sign- 
ing policies is an active area of research for ABS, as it allows for a wider 
range of use-cases and offers signers more flexibility. Despite this, many schemes 


1 We note here the work [37] of Su et al. that claims to propose a revocable ABS 
scheme, however we note that their scheme does not hide the attributes (nor takes 
a signing policy) so does not meet traditional definitions of ABS. 
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[15,32], including all known HABS constructions [13,19], utilise span programs 
that result in restrictive monotone boolean policies. Wang et al. [40] provide 
a different construction for threshold policies but benefit from shorter private 
key sizes over comparable schemes. There are notable exceptions that support 
even unbounded circuits [16]. In particular, [4] offers a lattice construction for a 
threshold scheme in a centralised setting, and then shows how to transform this 
to support more expressive (A, V)-policies. ABS from lattices supporting inner- 
product policies [41] have been proposed, yet without distinguishing between 
signers, which prevents any meaningful definition for delegation or revocation. 


2 Preliminaries 


We denote vectors by lower-case bold letters (a), and use capital bold font for 
matrices (A). The transpose of a matrix A (or vector) is denoted by AT, and 
the concatenation of matrices (or vectors) A and B by [A||B]. We use I to 
denote the identity matrix, and if we wish to be clear on the dimension then we 
write I,m, for some naturals n and m. The interval [a,b] is used to denote all 
integer values x in the range a < x < b. Sampling a random variable x from a 
distribution ¥ is written x — æ. The maximum number of users in the scheme 
is given by N = 2%, and we denote the security parameter by À. The number of 
levels in the hierarchy is l, and denote a signing policy by W, and set ô := |W, 
i.e. the number of attributes that form the signing policy. 


Lattices. Let n,m,q > 2 be integers. For a matrix A € Zj*™, define the m- 
dimensional lattice A+ (A) = {z €E Z™ : A-z=0 mod q} C Z™. For a vector 
u in the preimage of A, define the coset AL = {z € Z™: A-z=u mod q}. 


LWE. The (Decisional) Learning With Errors (LWE n,m,q,x) problem is as fol- 
lows. Let n,m > 1, q > 2 and x be a probability distribution over Z. Let s € Z7, 
then Ds,, is a distribution obtained by sampling a — Z% and e — x and com- 
puting (a,afs + e) € Zi, X Zq. Then the LWEn,.m,q,x requires an adversary to 
distinguish m samples chosen from x and m uniform samples from Z¢ x Zy- 
SIS. The Short Integer Solution problem (SIS pn. m,q,3), introduced in [1], requires 
an adversary who, given a uniformly matrix A € aoe, to find a non-zero vector 
z € Z such that ||z|| < 8 and Az =0 mod q. We define the Inhomogenous 
Short Integer Solution (ISIS, m,¢,3) as SIS but for a non-zero syndrome, i.e. 
Az=u mod q. 


3 VLR-HABS Model: Entities and Definitions 


We start with the description of entities for the VLR-HABS ecosystem. 


Attribute Authorities. The set of Attribute Authorities (AA) comprises the 
Root Authority (RA) and Intermediate Authorities (IAs). As the name suggests, 
the RA is the root of the hierarchy, and upon setup defines the universe of 
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attributes A. With its key pair (skdo, pkdy), the RA can delegate a subset of 
attributes to IAs which hold their own key pairs (skd;, pkd;), ¢ > 0. IAs can 
further delegate/issue attributes to other IAs or to any end user. This allows for 
a dynamically expandable VLR-HABS hierarchy to be established. 


Users. With key pair (usk,upk), a user joins the scheme by being issued 
attributes from potentially many AAs. Then, a user can use usk to create a 
VLR-HABS signature, provided their issued set of attributes A satisfies the pol- 
icy, i.e. W(A’) = 1 for some A’ C A and a signing policy W. Users are prevented 
from delegating attributes further and thus can be viewed as the lowest tier of 
the hierarchy. We realise this in our scheme by requiring users to obtain public 
keys in a different space to that of authorities. 


Warrants. A warrant is used to store delegated attributes for each IA or user. 
It contains the attribute, the delegation information, and a list of identities that 
comprise the delegation path of the attribute. Warrants are updated any time 
a new attribute is issued by appending a new entry. We use the notation |warr| 
to denote the size of the warrant, i.e. the number of attributes stored in the 
warrant warr, and we use |warr[a]| to denote the length of the delegation path 
of the attribute a € A. During the signing phase, the user submits a reduced 
warrant for an attribute set A’ C A that satisfies ¥(A’) = 1. We fix the maximum 
depth of the delegation path to be l € poly(A) and stress this is not a restriction 
on the minimum. 


Tracing Authority. The tracing authority (TA), independent of the hierarchy, 
is responsible for removing anonymity in the case of misuse. It can identify the 
signer and all authorities on the delegation paths for attributes that the signer 
used to satisfy the signing policy, and proves correctness of these identities by 
producing a publicly verifiable proof. 


Revocation Authority. The Revocation Authority (RevA) is a trusted third 
party that acts independently of the hierarchy. The role of the RevA is to publish 
a list of revoked IDs that cause any signature generated with a corresponding 
revoked identity to fail verification. The RevA, with a secret key, would require 
input of a user or AA identity in order to execute its function, which given the 
anonymity of VLR-HABS, could require extraction from a signature by the TA. 
In practice, it might be likely that the TA and RevA would be instantiated as a 
single authority whose role covers both functions, however, we present them as 
independent parties to cover a more general scheme. 


Definition 1 (VLR-HABS). A VLR-HABS := (Setup, UKGen, AKGen, 
Attlssue, Revoke, Sign, Verify, Trace, Judge) consists of nine processes: 


e Setup(1*) is the initialisation process. Based on some security parameter 
AEN, the public parameters pp of the scheme are defined. In this phase, the 
root, tracing and revocation authorities independently generate their own key 
pairs, i.e. RA’s (skdo, pkdo), TA’s (skta, pkra) and RevA’s (skreva, Pkpeya)- In 
addition, RA defines the universe of attributes A, and initialises an empty list 
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RevokeList. We stress that due to dynamic hierarchy, the system can be ini- 
tialised by publishing (pp, pkdo, pkra, Pkreva ) with A and RevokeList contained 
in pp. 

e UKGen(pp, skdo) is a key generation algorithm executed by the root authority 
for users and issued to users as (usk, upk, id). 

e AKGen(pp) is a key generation algorithm executed independently by interme- 
diate authorities. Each IA generates its own public key, i.e., pkd;, id; (i > 0). 

e Attlssue(warr;,a,{pkd,|upk;}) is an algorithm that is used to delegate 
attributes to an authority id; with pkd; or issue them to the user uid with 
upk. On input of an authority’s warrant warr;, an attribute a from warr;, 
and the public key of the entity to which attributes are delegated or issued, it 
outputs a new warrant warr for that entity. 

e Revoke(skreya, id) is an algorithm executed by the Revocation Authority. Using 
RevokeList from the implicit input pp, and on input of a User or AA ID 
(uid, id), it outputs an updated RevokeList. 

e Sign((usk, warr),m,W) is the signing algorithm. On input of the signer’s usk 
and (possibly reduced) warr, a message m and a predicate W it outputs a sig- 
nature o. 

e Verify(pkdy,(m,¥,o)) is a deterministic algorithm that outputs 1 if a candi- 
date signature o on a message m is valid with respect to the predicate V and 
revocation list RevokeList from pp, and 0 otherwise. 

e Trace(skta, pkdg,(m,Y,o)) is an algorithm executed by the TA on input of 
its private key skrta and a VLR-HABS signature o, it outputs either a triple 
(upk, warr, 7) if the tracing is successful or L to indicate its failure. Note that 
warr contains attributes and delegation paths that were used by the signer. 

e Judge(pkz,, pkdg,(m,¥,o), (upk, warr, ĉ)) is a deterministic algorithm that 
checks a candidate triple (upk, warr, 7) from the tracing algorithm and outputs 
1 if the triple is valid and O otherwise. 


A VLR-HABS scheme satisfies the correctness property if any signature o gen- 
erated based on an honestly issued warrant that satisfies the signing policy, will 
verify and trace correctly, if and only if identities used in the warrant have not 
been revoked. The output (upk, warr, 7) of the tracing algorithm on such sig- 
natures will be accepted by the public judging algorithm with overwhelming 
probability. Formally, we have: 


Definition 2 (Correctness). A VLR-HABS scheme is correct if the following 
condition holds: 


If (A) =1 and Va € A, dwarr[a] € warr s.t. warr[a] is valid, then: 
Verify(pkdo, (m, Y, Sign((usk, warr), m, ¥))) = 1 <= Vid € warr, id ¢ RevokeList 


and Judge((pkra, pkdo, (m, W, o), Trace(skra, pkdo, (m,W,o))) = 1 


3.1 Security Properties of VLR-HABS 


Our security definitions are closely related to path anonymity, path traceability, 
and non-frameability from [13] but with modifications to allow for revocation 
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functionality. We give new game-based definitions assuming probabilistic poly- 
nomial time (PPT) adversaries interacting with VLR-HABS entities through a 
set of oracles given below and formally described in Fig. 1. 


— ORegu : A registers new users through this oracle, for which a key pair will 
be generated and added to List. The public key is given to the adversary. 
Initially, the entity is considered honest, and so the public key is also added 
to the list HUList. 

— ORega : A registers new IAs through this registration oracle, for which an 
identity will be generated and added to AList, which is given to the adversary. 

— Ocoru : This oracle allows A to corrupt registered users. Upon input of a 
public key, the corresponding private key is given as output if it exists in List. 
The public key is removed from HUList so the oracle keeps track of corrupt 
entities. 

— Ocorra : This oracle allows A to corrupt registered IAs and User attribute 
keys. Upon input of a public key and an attribute, the corresponding private 
key is given as output the if the pair exists in AList. The identity is removed 
from HAList so the oracle keeps track of corrupt delegations. 

— Oat : A uses this oracle to invoke an attribute authority to delegate attributes 
to either an IA or to a user. In particular, the adversary has control over which 
attributes are issued and the oracle outputs a warrant warr if both parties are 
registered, otherwise it outputs L. The public key and attribute are added to 
a list HAList, that is initialised with {0,1,1,1,a}, Va € A. 

— Osig : A uses this oracle to obtain a VLR-HABS signature from a registered 
user. The adversary provides the warrant (and implicitly the attributes used), 
signing policy, message and the public key of the signer. If the attribute 
set satisfies the policy, and the public key is contained in HUList then the 
signature will be given to A, otherwise L is returned. 

— Oy, : A uses the Trace oracle on a VLR-HABS signature (provided by the 
adversary) to extract the attributes and identities. The TA does verification 
checks and upon failure, will return L, otherwise it outputs warr. 

— ORevip : A uses this oracle to revoke a user. The adversary has control over 
which IDs (both Users and AAs) are revoked. The oracle outputs an updated 
revocation list RevokeList if the entity exists in List or AList, otherwise it 
outputs L. 


Path Anonymity. This property guarantees anonymity of the signer as well as 
all intermediate authorities involved in attribute-delegation for attributes used 
to satisfy the signing policy. The definition for path anonymity for a VLR-HABS 
scheme is closely related to that given in [13], however we make adjustments to 
allow for the revocation feature. Our definition captures unlinkability for unre- 
voked signers. The experiment for path anonymity, defined in Fig.2, requires 
a two-stage PPT adversary (A1, A2) to distinguish which warrant and private 
key were used in the generation of the challenge VLR-HABS signature a». Ini- 
tially, A; generates the authority and user hierarchy, utilising the registration 
and delegation oracles. A challenge VLR-HABS signature oy according to the 
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predefined challenge bit b, using warrants and keys provided by the adversary. 
Then, with access to the tracing oracle, the adversary A> guesses b’. We note 
that the game returns 0 if A revokes the identity in either of the warrants warro 
and warr, that it provides the experiment. Since it does not have access to the 
revoke oracle in the second phase of the experiment, it cannot use this to help 
determine the challenge bit. 


Definition 3 (Path Anonymity). A VLR-HA BS scheme offers path anonymity 
if no PPT adversary A can distinguish between ExPP R HABS, A and EXPIAR HABS, 
defined in Fig. 2, i.e., the following is negligible in A: 


Advi rR-Haes, a (À) = IPeEXb Gig Haes a) =1]- PrlEXDU R Haes a) = IJ 


Oregu( i ), i ¢ List Oate(i, warri, a, {id;|uid; }) 
1: (id, usk;, upk;) — UKGen(pp) 1: L:= {(i, pkd; a, a)|{i, id, pkd; a, 
2: List List U {(i, id, upk;, usk;)} 2 skd; a, a} € AList} 
3: HUList — HUList U {i} 3: if (i,warri,a) € LV j ¢ List V AList 
4: return (id, upk;) 4: then , return L 
5 (skd; a, pkd; a) +— Attlssue(skd; a, 
Onega i J, t E AList 6: warr;, a, {id,|uid; }) 


1: id — AKGen(pp) 
2: AList — AList U {(i, id, L, L, 1)} 


3: return id 


warr,[a] — warr;[a] U {pkd, a, idj, a} 
8: AList — AList U {j, idj, pkd; a, skdj,a, a} 
9: HAList  HAList U {j, pkd; ,,a} 


‘ 10: return warr 
Ocorru ( u ) 


1: HUList — HUList \ {i} 


2: return skd; from List 


Osig(i, warr, m, W) 


1: A+ {aļ|a € warr} 


Ocorra (i, pkd; a, a) 2: if i@ HUList A W(A) then 
3 o — Sign((usk;, warr), m, W) 
1: HAList — HAList \ {7, pkd; ,, a} 
i 4: return 7 


2: return skd; a from AList sa returni 


Or(m, Y, o) Orevip (i, id, RevokeList) 


1: warr <— Trace(skta; pkdo, (m, ¥,0)) 4. ae (i, id,x,x, [k]) € List V AList then 


2: tok — Revoke(skreva, id) 
3: RevokeList — RevokeList U {tok} 


2: return warr 


4: return RevokeList 


Fig. 1. Oracles for VLR-HABS security experiments. 
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a-b 
EXPVLR-HABS, A (à) 


1: (pp,skdo, skra) — Setup(1*) 


2:  ((usko, warro), (uski, warrı), m, W) — Ai(pp, skdo : 
ORegU, ORegA; Ocorru; OcorrA, Orr, OReviID ) 
if |warro| = |warr;| then 
go +— Sign((usko, warro),m,W), o1 — Sign((uski, warr,), m, VY) 
if Verify(pkd,, (m, Y, co)) = 1 and Verify(pkd,, (m, Y¥,o1)) = 1 then 
b — A2(0, : Orr) 
return b ^ A» did not query Or (skra, (m, Y, ob) ) 


oo n aw AÀA O 


return 0 


Fig. 2. Path Anonymity Experiment for VLR-HABS 


Non-frameability. Defined in Fig. 3, and based on the definition in [13], this 
property captures traditional unforgeability notions, i.e., that no PPT adversary 
can create a VLR-HABS signature without having an honestly issued warrant for 
a set of attributes that satisfies the policy. It also forbids an adversary from fram- 
ing another user. The adversary wins if either it produces a valid VLR-HABS 
signature that verifies against a challenge key, or is able to perform delegation 
for at least one attribute on behalf of any honest authority that is not ‘below’ a 
corrupt authority. This trivially implies that the root authority must also remain 
honest. We also modify the original definition to include extra winning condi- 
tions that capture the scenario the adversary is able to produce a signature that 
verifies despite using an ID that was revoked. This can be seen in line 10 of 
Fig. 3. Finally, A also wins if it can generate a signature for which its attributes 
do not satisfy the policy. 


Definition 4 (Non-frameability). A VLR-HABS scheme is non-frameable if 
no PPT adversary A wins the experiment EXPULR-HABS,A defined in Fig. 3, i.e., 
the following advantage is negligible in A: 


Advi R-HaBs A (À) = Pr[Expvip-ans,a(A) =] 


Path Traceability. This property, defined in Fig. 4, provides accountability 
for authorities in the delegation path. It ensures that any valid VLR-HABS 
signature can be traced (by the tracing authority) to the signer and the path 
of authorities that were involved in the issuance of the attributes. To win this 
game, the adversary A is required to satisfy one of two conditions. Firstly, it can 
output a VLR-HABS signature that verifies but cannot be traced (that is, the 
tracing algorithm fails), or secondly, one in which the tracing algorithm outputs 
a warrant containing at least one unknown IA or user, i.e., were not previously 
registered in List or AList. To prohibit trivial attacks, we require the attribute- 
issuing oracle to check that both entities are registered (are in List or AList) 
before returning a delegated attribute. 


468 D. Gardham and M. Manulis 


nf 
ExPýLR-HaBs, A (À) 


1: (pp,skdo, skra) + Setup(1*) 


2: ((o,m, Y), (upk,, warr, ĉ)) = A(pp, pkdy, skta : 


O Att, Osig, ORegU, ORegA; Ocorrt; OcorrA; OReviD) 
3: if Verify(pkdy,(m,¥%,o)) A Judge(pkra, pkdo, (m, ¥, o), (upk,, warr, 7)) then 


4: if j € HUList A A did not query Osig((usk;, warr), m, Y) then , return 1 
5: if Ja € warr => (pkdọ, pkd,,...,pkd,_,, upk;) = warr[a] A 
6: Vj € [0,1] : (j, pkd,, a) € HAListv 
(ae ((Bi € [0,1 — 2]. A didn’t query Oatt(i, - , a, pkd,. 1) 
and Vj € [0,7] : (j, pkd;,a) € HAList) v 

8: (A did not query Oan(l — 1, - ,a, upk;) 

A Vj € [0,7] : (j, pkd;,a) € HAList) ) then , return 1 
9: if Y(A) #1, where A := {ala € warr} then , return 1 
10: if Ji s.t. id; E€ RevokeList N warr then , return 1 
11: return 0 


Fig. 3. Non-Frameability Experiment for VLR-HABS 


Definition 5 (Path Traceability). A VLR-HABS scheme offers path trace- 
ability if no PPT adversary A can win the experiment EXPVLR-HABS,A defined in 
Fig. 4, i.e., the following advantage is negligible in A: 


Adwiir-Hass,.4() = |Pr[Exp¥ir-Haps,a(A) =1]| 


4 VLR-HABS Scheme 


In this section we detail the core contributions. Firstly we introduce the VLR 
mechanism in Sect. 4.1, then the zero-knowledge protocol in Sect. 4.2, and present 
the scheme itself in Sect. 4.3. Our construction makes use of a number of building 
blocks. Due to space limitations, we defer details to the full version [20] where 
we also provide descriptions of how to make them compatible with our zero 
knowledge proof. 


4.1 New VLR Mechanism 


We introduce a novel verifier-local revocation scheme that relies on the LWE and 
SIS hardness assumptions. For our scheme, a central authority (RevA) maintains 
a list of revoked identities in a list RevokeList. A user is required to produce and 
publish privacy-preserving revocation tokens during the signing phase of the 
signature scheme. As part of verification, values from RevokeList are used to 
check whether the revocation tokens pass or fails verification. 


Revocable Hierarchical Attribute-Based Signatures from Lattices 469 


Expyir-uass,.a(A) 
1: (pp,skdo, skta) + Setup(1*) 
2: ((o,m,W), (upk, warr, 7)) < A(pp, skrta : 


O Att, ORegu, ORega, Ocorru; OcorrA, OReviD ) 
if Verify(pkdg, (m,¥,o)) then 
if Trace(skta, (m,¥,o)) = L then , return 1 
if Judge(pkz,, pkdy, (m, Y, o), (upk, warr, 7)) A 
(3a € warr = > (pkdo, pkd,,..., pkd,_,;, upk) = warr[a] A 
( (Bi € [0,1 — 2]. i € HAList Ai + 1 ¢ AList) V 
(l — 1 € HUList A ( - , upk, usk) ¢ List) ) ) then , return 1 


oO wa NO ot A WwW 


10: return 0 


Fig. 4. Path Traceability Experiment for VLR-HABS 


To create a revocation token, the user samples a uniform binary matrix B —> 
Bksx™3 (m3 = m4(ld+1)) and computes the LWE instance C = BRig + E 
for each id in the delegation paths, where Rig is the encodes id of the entity 
and E is an error matrix of size k3 x ma. This is repeated for each identity 
contained in the delegation paths of the obtained attributes, i.e. id € warr. In 
this work, we compute Rig = (RAR). | Risa) where id[i] is the i” bit of 
id and R? are uniformly sampled matrices in Te and are published in the 
public parameters. Finally, R is the public key of the RevA, whose corresponding 
secret key is a trapdoor TR that allows RevA to solve SIS instances with respect 
to R. At a high level, due to the pseudo-randomness of binary-secret LWE, we 
argue that C is statistically close to a sample from a uniform distribution, thus 
no adversary can learn the identity committed to in C, which more generally 
maintains the anonymity properties of the VLR-HABS scheme. 

To revoke an identity, RevA uses its trapdoor TR for R to compute an 
extended trapdoor for Rig, using the ExtBasis and RandBasis algorithms from 
Bonsai Signatures [10], recalled in the full version [20]. These algorithms allows 
RevA to compute a short basis for an extended matrix if it has knowledge 
of a trapdoor for the input matrix R. Here, the extended matrix is Rig. The 
RandBasis algorithm randomises the extended basis, with some loss in quality, so 
that the trapdoor TR cannot be recovered. It then invokes SampleD to compute 
a small vector y such that Rigy = 0, i.e. solving the SIS problem for Rig. It 
appends y to a public revocation list RevokeList. During the verification phase, 
the verifier obtains RevokeList from the revocation authority and computes Cy = 
BRiay + Ey for each y € RevokeList. If id has been revoked, then Rigy = 0 for 
some y € RevokeList and hence Cy = BRigy + Ey = BO + Ey = Ey where 
\|Ey|| < n9?. If ||Cy|| > n8? for all y € RevokeList and C in the signature, then 
the verifier is assured that the signature was not generated using a revoked id. 

To show the correctness of C and that it contains the IDs encrypted in 
the ciphertext, the signer is required to generate a zero-knowledge proof. In 
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the decomposition-extension framework for Stern-like protocols, this is done by 
instead letting R* := [RIRS R®P|]..R® [RO] and proving the g relation 
C = fa(B)R* + E. See the full version [20] for definitions and how to remove 
the dependency of Ajg on id. This makes the resulting relation linear, and is 
efficiently provable using the proof in Sect. 4.2. We also note that R* is now 
public so the prover only needs to hide B, E and id as part of the witness. 


4.2 Zero-Knowledge Protocol 


We define a Stern-like protocol that will form the core of our VLR-HABS scheme. 
The protocol will allow a signer to convince the verifier in zero-knowledge that: 


1. The warrant contains a set of committed attributes that satisfy the policy. 
2. For each attribute in the warrant, the signers possess a valid delegation path. 
3. The ciphertext is a correct encryption of the IDs that appear in the warrant. 
4. The signer’s revocation token is correctly committed via an LWE function. 


The protocol is instantiated with the following public parameters: A, R, {A?}!¢,, 
{Rb}, G*, P*, Q, {Cia hidewarr, {f;}2_1,p,u. The prover’s witness are the vec- 
tors uid, Zo, {Z;, ai, e:}°_,, {id}idewarr and the matrices {Big, Eia }iaewarr- The rela- 
tion is defined as: 


A jidy||...|lid](2i) =a; mod qı,i € [1,4] and id; € warr[a;] 

Cia = BRig +E mod q for id € warr 

fia = P*e + Qlid,||...|/uid] mod q2 for every id € warr[a;], i € [1,6] 
(a, p) = 1 where a = [aj|]...||as] A Auia(Zo) =u mod qı 


Ry = 


Theorem 1. Let COM be a statistically hiding and computationally binding 
string commitment scheme. Then the protocol is a zero-knowledge argument of 
knowledge with perfect completeness with soundness error 2/3. That is: 


e There exists a poly-time simulator that outputs an accepting transcript that 
is statistically close to one produced by an honest prover with a valid witness. 

e There exists a poly-time extractor, such that, on input of a commitment CMT 
and 3 responses (RSP1, RSP2, RSP3) corresponding to each challenge {1, 2, 3}, 
outputs a valid witness for the relation Ry. 


A full description of the protocol, and proof of Theorem 1, can be found in 
the full version of this paper. However, in the full version of this paper [20], 
we discuss how the sub-relation for each building block is proven along with an 
overview of some techniques from a line of works by Ling et al. [27-29]. Our full 
protocol combines these techniques to prove the relation R1. 


4.3 Specification of VLR-HABS 


We now give high-level description of our lattice-based VLR-HABS scheme and 
present the formal algorithms in Figs. 5 and 6. 
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Setup: The setup algorithm generates the public parameters and is executed 
by a trusted party to initiate the scheme, it begins by setting a parameter d 
where 24 will be the maximum number A AAs and Users in the scheme. It 
samples uniformly random matrices {Aly BY ime oaks: ARDY > Zimsxks 
(b € {0, 1}) that generate the public keys a i leds E Two 
further matrices are computed (A,R) with corresponding trapdoors (Ta, R) 
according to GenBasis(n1, m1, q1) and GenBasis(n3, m3, q3), respectively. The key 
pair (skra, pkra) := (Ta, A) is that of RA and (skreva, Pkreva) := (TR, R) is for 
the RevA. The TA also generates its key pair for IBE-GPV as (skta, pkra) := 
(Tp,D) — GenBasis(nz2, m2, q2). Finally, it samples a vector u € Zi) that is 
used in the key-issuing phase of the scheme, and defines the attribute universe 
as A = {a;}%,, for N ie attributes. It outputs these under public parameters 
pp := (A,R,D,u, {AP E2, (R}4_, A), which will be an implicit input to 
all algorithms. 

UKGen: To join the scheme, the RA selects an identity id as a binary string of 
length d. It computes the corresponding public key upk := Ayig as [AJA 
Aly. It computes Ta,,, < RandBasis(ExtBasis(T A, ira ,6ı), and then 
computes the user signing key as usk = Zz — SampleD(T q,,,, Auia, U, 61), which 
satisfies AyigZo = u. The key pair is issued to the user. 


uid ? 


AKGen: For an Authority joining the scheme, it is issued an identity id € {0,1}4. 
The keys for the authorities are issued during attribute delegation as they are 
dependent on both the attribute and position within the hierarchy. 


Attlssue: This algorithm takes as input an attribute a = a, an AA secret key ask,, 

a public key apk,,, for either an IA or a user, and warr containing a matrix A; 

with corresponding trapdoor T 4,. For the attribute a, a kt” level AA extends its 
. idı [1 idi[d id; id; [d] 

public key Ajg,|| tlds = (ANAT M]. Ag AARAA ha] to a k+ 1- 


(k+1)d 
f : idj id; [d 
level entity by computing A’ — A gal MAE al and executing Ta, ,a — 


RandBasis(ExtBasis(T 4,,, [Aia||A’]), a, 81). If it is issuing to an authority, it sets 
skd; — T A;a; if it is issuing to a user, then it first computes a Bonsai signature 
on A; with respect to a. That is, a short vector z — SampleD(T Aa, Aj, a, 81) 
and sets skd; = z. It appends (id, Aj, a,skd;) the (possibly empty) warrant and 
returns warr. 


Revoke: To revoke an identity, the algorithm Revoke takes as input a user 
identity, id. Using its secret key, skreva = Tr, a trapdoor for R, it sets 
Rig = [RR | |... [Ri] and then computes the revocation token y as a short 
vector that satisfies Rigy = 0. It appends y to RevokeList. 


Sign: The signing algorithm takes as input a set of attributes {a;}9_, with asso- 
ciated {Aia Tay, = z;}°_,, the user’s secret key usk = zo, a message m and 
a policy W, and we denote the signer’s ID as uid for clarity. Recall that for 
each attribute, the delegated keys are short vectors that solve Ajg,z; = a;, and 
similarly the public value u proves that the signer has knowledge of usk as it 
solves AyigZo = U. It then prepares its revocation tokens by computing an LWE 
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instance as, for each id € warr, Cia = BRig+ E where B is a binary matrix sam- 
pled from B”*"3 and E is an error matrix sampled from y. B must be a binary 
secret so that the norm is small and we can use the zero-knowledge protocol. It 
generates keys for a one-time signature as (osk, ovk) and encrypts the identities 
of the AAs in the delegation path under the Identity-based Encryption Scheme 
(GPV-IBE) using ovk as a pseudo-identity. To do this, it samples e1, e2 — x and 
s > Zy and computes the following: p) = Ds + &, £0) = NTs + e2 + |q/2]id; 


and sets f; = [Ae | A? for each i € [1,6]. It generates the zero-knowledge argu- 
ment of knowledge m, described in Sect. 4.2 for the relation R 1. In particular, 
ZKAoK ensures that for each identity that appears in a delegation path for an 
attribute also, it appears in a corresponding ciphertext and revocation token. 
This is done by showing that the vectors t and matrices Tig. belong to the 


sets SecretExtg (d) and SecretExt g(d4?), respectively for a delegation path å, 
identity id and where the vectors d; are the message encrypted in the ciphertext. 
Using the Fiat-Shamir heuristic?., the signer turns the interactive protocol into 
non-interactive and binds the message to the message, policy, revocation tokens 
C = {Cia }idewarr, ciphertext f := {E} and proof. It computes the challenge 
as: 

CH = {Ch}; = {Hı(m, Y, f, C, ovk, pp, CMT;) }_, 


Finally, it computes a one-time signature over the proof 7, ciphertext f, 
message m and policy W as oy. Since the choice of the OTS can be generic 
we leave the function here unspecified. However, for security and instantiation, 
we shall reuse the Bonsai signature scheme from [10], with use of a chameleon 
hash function Hə : Z; — {0,1}. It outputs the VLR-HABS signature: o = 
(f, C, 7, 75, ovk). 


Verify: To verify a candidate signature o, a verifier obtains the list RevokeList 
from the RevA, potentially offline and before the signature is presented. It parses 
o as (£,C, 7, co, ovk) and checks that 7 and co pass verification. It then computes 
\|Ciay|| and outputs 0 if any ||Ciay|| < n383 for any y € RevokeList, id € warr. 


Trace: On input of a candidate VLR-HABS signature, the tracing algorithm 
parses ø as (f, C, 7,09, ovk). It first verifies ø, then, using its secret key, Tp it 
can create an identity-dependent decryption key Soy, for a ciphertext fig, with 
which it can extract the user ID and identities of the authorities that appear in 
the delegation path of any attribute. This algorithm outputs Sovx, {idi}2_1. 


Judge: This algorithm is then able to verify the correctness of decryption of this 
IBE-GPV ciphertext. It takes as input the decryption key Soy, and checks that 
it is a valid key for f;, that is, it checks DS 2 Ho(ovk) and ||S|| < 82. If this 


passes, it also checks the decryption is correct by evaluating f = Pře + Qid and 
outputs 1 if all checks hold, else it outputs 0. By using a one-time identity in 


? As in [4], we choose to present the FS heuristic for simplicity. We note, however, 
that one could instantiate our scheme with the Unruh transform of [39] to achieve 
security in the quantum random oracle model (QROM). 
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our IBE scheme, we are able to bypass expensive zero-knowledge proofs in this 
stage and instead only require the Trace algorithm it output a verifiable key for 
the “identity” ovk. 


Detailed Description. We provide the complete specification only for the Setup 
and AKGen algorithms here, and define UKGen, Attlssue, Revoke, Sign, Verify, 
Trace, Judge in Figs.5 and 6. 


— Setup(A). It generates the Root Authority and Revocation Authority key- 
pairs as (skdo, pkdo) := (Ta, A) — GenBasis(n1, mı, q1), and 
(SkrevA; Pkreva) := (Tr, R) — GenBasis(n3, m3, q3). Next it samples ran- 
dom matrices {A?}}2; > Zm*™ and {Ri}, > Zg*ms, During this 
phase, the TA keys are also computed as (skdra,pkdp,4) = (Tp,D) <— 
GenBasis(nz2, m2, q2). Define v2 be a 82 bounded distribution Dz,_,., and sim- 
ilarly let x3 = Dz,a, (i.e. bounded by 83). 
— AKGen. Sample and output id — {0,1}4. 


UKGen (asko) 


Judge (ø, warr, Sov, RevokeList) 


0: 


N 


Sample uid +> {0,1}? 

Avia = [AAR] Ai] 
Ta & ExtBasis(Ta, Auia) 

Zuid <- SampleD('T’g , Auia, U, 61) 
id + uid 

(skdia, pkd,y) +} (Zuid, Auia) 
return (id, skdia, pkdiq) 


Revoke (Tr, (id, RevokeList)) 


Ria © [R]IR¥™]].. R2] 
Tr» + ExtBasis(Tr, Ria) 
y + SampleD(TR-, R*, 0, 83) 
RevokeList + RevokeList U {y } 


return RevokeList 


0: Parse ø as 
(m, W, m, {EP EP] Han, {C:}21) 
1: if Hist. [£0 — Sow £®] Z warrli], 
return 0 
2: elseif DSax # Ho(ovk), return 0 


3: else return 1 


Trace (o, Tp, RevokeList) 


o: Parse ø as (nt, Y, m, {f}, {:}2i) 
1: if Verify(Ao, (T, oo, ovk, {fi }21, 
{C: H2), RevokeList) = 1 then 
Sovk < SampleD(Tp, D, Ho(ovk), 82) 


3: for i € [1,..., 6] : 

4: Parse f; = [£0 ||£] 

5: [ida..Iuid] = [£0 — Soa - £7 
6: warr = warr U {id1, ..., uid} 

7: return (warr, Sov) 

8: else return L 


Fig. 5. Algorithms UKGen, Revoke, Trace and Judge of our VLR-HABS construction. 
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Attlssue (warr;, a, [pkd,|upk]) 


0: Parse (skd;,a, pkd;) as ((Ta,,a, Ai, idi), Aj, id;) 
1: Ae An yee la 
T'a» + RandBasis(ExtBasis(Ta,,, [Aia||A’]), a, 81) 
if |id;| = ld then 

zj < SampleD(T’,-, [A;||A;], a, £1) 

skdj a + Zj 


else, skdja + Ta» 


N åO ow A W N 


return warr = warr; U {id;, Aj, skdj,a, a} 


Sign ((usk, warr), pkd, m, Y) 


0:3 Parse warr as {idi j, Aij, Zi, ai hief, ljeti with pA =å 
1: (ovk, osk) + OTS.KGen(A) 
2: N := Ho(ovk),s < Z}, e1 < x2, e2  X2 


3: foreach i € [1,6] compute 

4: f® =D7s + e1, £P = N”s + e2 + [q/2|id; and set f; = EP ||£] 
5: foreach i € warr 

6: R; + [RRP]... R3], B; e BX, E, e yk. 


Compute C; + B;R; + E; 
7: Set f = {fi}, C = {Ci hewar 
8: m = ({CMT;, RSP;, CH: }i-1) +} ZKAoK (uid, zo, {idi, Bi, Ei bicwarr, 
{ei, zi, ai}, (C,Q, P, f, ovk, Y, pp), R1) 
9: Oo + OTS.Sign(osk, H2(m, Y, m,f, C)) 


10: return o ¢ (7, do, ovk, f, C) 


Verify (pkd,, o, RevokeList) 


0: Parse o as (7,00, ovk, {E}, {Ci }icwarr) 


1: if Jy € RevokeList and JC; : i € warr s.t. ||Ciy|| < n383, return 0 
2: if ZKAoK.Verify(7,¥,m,{C,}/2,) 41, return 0 
3: if OTS.Verify(ovk, oco, H2(m, Y, m,f, C)) #1, return 0 


4: else return 1 


Fig. 6. Algorithms Attlssue, Sign and Verify of our VLR-HABS construction. 


5 Security, Efficiency and Extensions 


In this section, we state the security theorems, followed by efficiency considera- 
tions and parameter selection for our VLR-HABS scheme. We start by giving two 
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lemmata that we will use in the analysis of our scheme. The proofs for Lemmas 
1 to 5, and Theorems 2 and 3 are given in the full version of this paper [20]. 


Lemma 1. Let 3 = poly(n), q > (2n82 +1)? and m > 2n, then for a fired 
y € Zọ with ||y|| ~ < 8, and a uniformly random matrix C —> germ, we have 
Pr[||Cy |], < n8’] < negl(n). 


Lemma 2. Let 8 = poly(n), then for (R,B,C,E,y) € Zass x Zg” ™ x VAh ai x 
Z*E x ZE such that Ry = 0 with |ly||,, < 8 and C = BR +E, where B,R are 


uniformly random and E is drawn from -bounded distribution x over Zg, then 
Pr[||Cy|], < n°] = 1. 


Theorem 2. Our VLR-HABS construction given in Figs. 5 and 6 is correct. 


Theorem 3. Let COM be a statistically hiding and computationally binding 
string commitment scheme. Then our VLR-HABS construction given in Figs. 5 
and 6 offers path anonymity, non-frameability and path traceability in the 
Random Oracle Model if the NEn, m2,q2,x2; WeEnsz,ms,¢a.v3 and SISn5 ms,95,85: 
SISn,mi,q1,61 aNd SISn, ms,q1,84 problems are computationally infeasible in A, Ho 
and Ho are collision resistant and Hı is a random oracle. 


5.1 Efficiency and Parameters 


We instantiate the scheme with the parameter choices given in Table 1. We used 
the estimator by Albrecht et al. [2] to evaluate the estimated security of each 
LWE and SIS instance. For the values relating to the Bonsai signature, GPV- 
IBE, OTS and KTX commitment scheme COM, we directly use conditions as 
given in their original works. For the ZKAoK we use the parameters of the 
underlying commitment scheme and use a soundness parameter t = w(A). The 
values for n; are assumed to be fixed and are typically a small polynomials in X. 
The VLR mechanism uses values from the trapdoor delegation in [10] restricted 
to the conditions of Lemmas 1 and 2. Public keys are elements of Yh a which 
is quadratic in nı. The signing operation takes t - O(|warr|(n? + n3 +73) + n4) 
steps and the length of signature is also quadratic in the parameter nı. The 
revocation check that completes the verification algorithm is linear in the number 
of revoked users, which matches other VLR schemes such as [24] where they note 
this complexity for VLR seems unavoidable. The Trace and Judge algorithms are 
linear in |warr|. 

We briefly note some final generic changes to improve upon efficiency. Firstly, 
the protocol benefits from the pre-computation of offline/online signatures [36] 
that are naturally compatible with our one-time signature. Here, the OTS sig- 
nature is produced ahead of time (potentially batched), using a chameleon hash 
function for H3, that would allow the signer to find a corresponding randomness 
to match the message it must sign when generating the VLR-HABS signature. 
Secondly, commitments in ZKAoK can be hashed prior to sending to minimise 
size of the signature, at the expense of additional hash computations by the 
signer and verifier. Thirdly, delegation of a short basis has order O(n”). More 
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efficient trapdoor delegations exist, e.g. the lattice trapdoor by Micciancio and 
Peikert [33], however it is not clear how to argue security as the structure of 
the resulting Boyen signature does not lend itself to be embedded over multiple 
delegations. Finding a more efficient yet compatible trapdoor could be viewed 
as an interesting open problem. Finally, we note using complexity assumptions 
and tools for ideal lattices [31] instead of integral lattices reduce most of their 
associated operations by about a linear factor in the security parameter. 


Table 1. Parameter Selection for VLR-HABS based on its building blocks. We target 
128-bit security, therefore set the soundness parameter t = 219. ID bit length d = 16, 
which supports 65536 entities across a hierarchy of depth | = 3. Note m; are the number 
of samples in LWE & SIS challenges. 


Building Blocks i ni mi Mi qi bi ai 
Bonsai Signature 1 500 618 9840 2° 31440 — 
GPV-IBE 2 400 — 16800 2!° — 10-4 
VLR 3 1400 1840 29440 2°? — 10-3 
OTS 4 500 41 656 274 31440 — 
COM 5 400 — 25600 24° 3200 — 


5.2 Revoking Attributes 


We now briefly describe how to achieve attribute revocation for our VLR-HABS 
scheme. We observe that we can apply similar techniques to those used to revoke 
users. In particular, the signer, upon generating a VLR-HABS signature, also 
commits to the attributes as LWE instances C = BA+E, where A is the binary 
decomposition of an attribute concatenated with R, [R||a]. Revocation is then 
performed by the authority by computing a short vector such that ||Cy|| < n6?. 
The argument ZKAoK would have to be modified to show that the attributes 
are correctly committed to. Minor modifications to the security properties path 
anonymity, non-frameability and path traceability would be required. In partic- 
ular, in path anonymity we prevent the adversary from revoking an attribute 
used in the challenge signature, as this would allow it to trivially break path 
anonymity. This can be achieved by standard bookkeeping techniques and the 
proofs follow a similar strategy. 

Revoking an attribute for a specific user is similarly possible by altering the 
delegation process to delegate an attribute of the form id||{att (the bit-string 
concatenation) and further requiring the ZKAoK to link id to the identity used 
in Aig and Rig. This could be achieved using the techniques already used in the 
zero-knowledge protocol. We finally note that this would require the policy to 
be encoded in a specific format that, intuitively, ignores the first d bits of the 
attribute id]|att. 
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6 Conclusion 


The VLR-HABS scheme proposed in this paper improves upon security and func- 
tionality of existing HABS constructions by proposing a lattice-based scheme 
which supports verifier-local revocation and a wider range of signing policies. 
Our scheme is based on LWE and SIS assumptions which are believed to offer 
post-quantum security. It supports inner-product relations which allow for con- 
junctive, disjunctive and threshold policies as well as polynomial evaluations of 
attributes. Revocation in our HABS schemes uses a novel VLR mechanism that 
allows revocation of signers, attributes as well as intermediate authorities. Our 
scheme also implies the first lattice-based (non-hierarchical) ABS scheme with 
these properties. 
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Abstract. Introduced by von Ahn et al. (STOC’05), covert two-party 
computation is an appealing cryptographic primitive that allows Alice 
and Bob to securely evaluate a function on their secret inputs in a 
steganographic manner, i.e., even the existence of a computation is obliv- 
ious to each party - unless the output of the function is favourable to 
both. A prominent form of covert computation is covert authentication, 
where Alice and Bob want to authenticate each other based on their 
credentials, in a way such that the party who does not hold the appro- 
priate credentials cannot pass the authentication and is even unable to 
distinguish a protocol instance from random noise. Jarecki (PKC’14) put 
forward a blueprint for designing covert authentication protocols, relying 
on a covert conditional key-encapsulation mechanism, an identity escrow 
scheme, a covert commitment scheme and a X-protocol satisfying sev- 
eral specific properties. He also proposed an instantiation based on the 
Strong RSA, the Decisional Quadratic Residuosity and the Decisional 
Diffie-Hellman assumptions. Despite being very efficient, Jarecki’s con- 
struction is vulnerable against quantum adversaries. In fact, designing 
covert authentication protocols from post-quantum assumptions remains 
an open problem. 

In this work, we present several contributions to the study of covert 
authentication protocols. First, we identify several technical obstacles in 
realizing Jarecki’s blueprint under lattice assumptions. To remedy, we 
then provide a new generic construction of covert Mutual Authentica- 
tion (MA) protocol, that departs from given blueprint and that requires 
somewhat weaker properties regarding the employed cryptographic ingre- 
dients. Next, we instantiate our generic construction based on commonly 
used lattice assumptions. The protocol is proven secure in the random 
oracle model, assuming the hardness of the Module Learning With Errors 
(M-LWE) and Module Short Integer Solution (M-SIS) and the NTRU 
problems, and hence, is potentially quantum-safe. In the process, we also 
develop an approximate smooth projective hashing function associated 
with a covert commitment, based on the M-LWE assumption. We then 
demonstrate that this new ingredient can be smoothly combined with 
existing lattice-based techniques to yield a secure covert MA scheme. 
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1 Introduction 


The major goal of cryptography is to protect the security of the computation and 
communication over insecure networks. Steganography, on the other hand, aims 
to hide the very fact that some computation or communication has taken place. 
Covert cryptography is the research area that aims to simultaneously achieve 
the goals of both cryptography and steganography, i.e., to ensure the security of 
cryptographic protocols and to hide their existence from adversaries at the same 
time. A secure protocol is said to be covert if the communications between two 
parties can not be distinguished from the message flows in the public channel. 
Note that this is only possible when the public channel is steganographic, namely, 
it contains sufficient min-entropy. An example of a steganographic channel is the 
random channel, where messages are uniformly random over some finite ranges. 

The study of covert cryptography was initiated by von Ahn et al. [40], who 
introduced the notion of covert two-party computation. Chandran et al. [9] sub- 
sequently generalized this notion to the multi-party setting. In these protocols, 
participants can compute any functionality of their inputs in a way such that 
no observer can distinguish the exchanged messages from random flows in the 
public channel, and, even protocol participants cannot determine whether the 
other party is following the protocol. In both constructions from [9,40], the pro- 
tocols require a linear number of rounds in the circuit representations of the 
desired functions. In fact, Goyal and Jain [22] later showed that maliciously- 
secure covert computations could not be done in a constant number of rounds if 
there is no access to trusted parameters. However, this impossibility result can 
be by-passed if one assumes the existence of trusted parameters or public keys - 
which are mostly available in practical applications. 

A prominent sub-area of covert cryptography is the study of covert authen- 
tication. In such protocols, two parties aim to mutually authenticate each other 
using verifiable certificates in a covert manner: a dishonest party who does not 
possess a valid certificate is not only unable to succeed in the authentication 
but also cannot distinguish a protocol instance from a random channel mes- 
sage. Jarecki [24] gave the first constant-round construction of covert mutual 
authentication (consisting of 5 rounds - which can be reduced to 3 rounds in 
the random oracle model). His protocol additionally supports the revocations 
of group membership, and is proven secure under the strong RSA, the DQR, 
and the DDH assumptions. The protocol is practically efficient, but it is vulner- 
able against quantum adversaries. To date, the design of covert authentication 
protocols based on post-quantum assumptions remains an open problem. 

In this work, we aim to tackle the above discussed open question. Specif- 
ically, we study the plausibility of constructing covert authentication proto- 
cols based on lattice-based assumptions - which are among the most prominent 
foundations for cryptography in the post-quantum era. Lattice-based cryptogra- 
phy [1, 18,19, 21,37,38] is an emerging research direction that receives significant 
attention from the community. Lattices have enabled virtually any cryptographic 


482 R. Kumar and K. Nguyen 


primitives one can think of. It would be quite natural to think that it is tech- 
nically straightforward to obtain a lattice-based covert authentication scheme. 
However, we observe that there are non-trivial challenges on the way. 

In [24], Jarecki gave a blueprint to construct a covert mutual authentication 
(MA) protocol, based on an identity escrow scheme [28] (namely, an interac- 
tive form of group signatures [10]) and a covert conditional key encapsulation 
mechanism (CKEM) scheme. The latter ingredient, i.e., CKEM, can be seen as 
an encryption counterpart of zero-knowledge proofs (ZKP) [20] or as a general- 
ization of smooth projective hash (SPH) functions [12] to interactive protocols. 
Jarecki designed a covert CKEM with the witness-extraction property (so that 
it would be possible to extract a group certificate in case of a forgery) via a 
combination of an SPH, a covert commitment scheme (i.e., one that produces 
uniformly random commitment values) and a 57-protocol [11] with some special 
properties. The main idea is to let the prover covertly commit to his first mes- 
sage a as com, send response z to the challenge c from the verifier and then 
execute an SPH with the verifier on the statement that a, which is supposed to 
be recoverable based on (x, c, z), is indeed contained in com. We refer the reader 
to the original paper [24] for details on this generic construction. 

While Jarecki’s blueprint [24] can be efficiently instantiated from traditional 
number-theoretic assumptions, we note that there are 3 distinctions in the lattice 
setting: (i) Lattice-based primitives typically have to deal with noises [38], and as 
a consequence, it is notoriously hard to obtain exact versions of smooth projective 
hashing [5,25,26,42]; (ii) Existing efficient lattice-based X-protocols [6,16,34] 
normally admit a gap in soundness, namely, the language for which soundness 
can be achieved is a strict superset of the one used for defining zero-knowledge- 
ness!; (iii) Protocol messages in the lattices setting are not always uniformly 
random, e.g., they can be samples from discrete Gaussian distributions. These 
aspects make it challenging to realize covert MA protocols from lattice assump- 
tions. These obstacles also inspire us to revisit Jarecki’s generic construction: Can 
we achieve covert MA based on somewhat weaker assumptions on the underlying 
cryptographic ingredients? 


OUR RESULTS AND TECHNIQUES. This work provides several contributions to 
the study of covert mutual authentication protocols. First, we revisit the notion 
of covertness defined in [24]. Instead of specifically requiring a uniformly ran- 
dom channel, we suggest a generalized formulation by assuming that the public 
channel messages are distributed according to a probability distribution that is 
efficiently and publicly sampleable. We then say that an interactive protocol is 
covert if its transcript can be efficiently simulated by a simulator that only has 
access to the public information. Second, we provide a new generic construction 
of covert mutual authentication, that relies on an approximate smooth projective 
hashing (ASPH) scheme with associated covert commitment, a key reconcilia- 
tion scheme and a group authentication scheme. The first two ingredients can 
handle the noises as well as the soundness gap, that occur in the lattice setting, 


1 There exist exact lattice-based zero-knowledge proof systems with no soundness gap, 
e.g., [7, 15,32], however, they tend to be relatively less efficient. 
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as discussed above. Meanwhile, the third ingredient can be seen as an interactive 
version of group signatures, where there is no opening authority that can break 
users’ anonymity”. Hence, while our construction does not support user revo- 
cation, it can achieve a stronger security notion than the one from [24], which 
we call external covertness. This robust property guarantees that any adversary 
having access to all the public and private information of the protocol will not 
be able to distinguish between an actual protocol transcript and a simulated 
transcript sampled according to a given distribution. 

Our next contribution is to instantiate the new generic construction from 
lattice assumptions. To this end, we provide a construction of ASPH based on 
module lattices. An ASPH scheme with covert commitment aims to compute two 
“nearby” hash values of the message. The first hash value is obtained by using 
the hashing key and the commitment, while the second one is computed using 
the projective key and randomness used in the commitment. Our construction 
is adapted from the Katz-Vaikuntanathan construction [26] that operates in 
general lattices. We observe that the encryption scheme used for ASPH in [26] 
can be replaced by a commitment scheme. The scheme’s public information 
consists of random matrices A; and Ag that are “tall”, i.e., their numbers of rows 
are significantly greater than their numbers of columns. In this way, matrices 
A1, Ag do represent sparse random lattices. A commitment to a message is then 
a Learning-With-Errors (LWE) instance [38] of the form 


com(m;r) := Aym+ Aor +e, 


for which the LWE secret is the message m concatenated with a random vec- 
tor r. The hiding property of the scheme follows from the Module-LWE assump- 
tion [8,30]. As in [3,6], we consider a relaxed notion of binding for the employed 
commitment scheme, in which the set of acceptable openings could be a superset 
of set of honestly generated (message, randomness) pairs. More specifically, we 
consider the set containing the tuple (com,m,r) for which there exists a ring 
element z such that ||z(com — A,m) — Agr|| is small. Using a technical lemma 
about the length of the shortest vector in the lattice generated by the random 
matrix, we can prove the relaxed binding property of the scheme. In the ASPH 
scheme to compute the first hash value h, we sample a vector f from discrete 
Gaussian distribution. Then, the hash value h is f7 (com —_A,m) and the pro- 
jection key pk is f° Ao. The second hash value h’ is pk - r. It is easy to show 
the correctness of the ASPH protocol. The main challenge here is to prove the 
Soundness, for which we need to show that when the given commitment to a 
message is not contained in the relaxed set then (pk, h) is statistically close to 
uniform over the respective domain. For this end, we use a theorem from [25] 
about the distribution of a matrix multiplied by a vector sampled from a Gaus- 
sian distribution. Suppose the lattice generated by the matrix has a significantly 
large shortest vector. In that case, the distribution of the matrix multiplied by 
a vector sampled from Gaussian distribution is indistinguishable from a uniform 


? Alternatively, one can view group authentication as an interactive form of ring signa- 
tures [39], where there is a centralized authority who is in charge of user enrollment. 
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distribution. As the commitment is not contained in the relaxed set, we know 
that the lattice generated by the matrix [((com — Aim) Ag] has a significantly 
large shortest vector, and by using the property, we can demonstrate the sound- 
ness of the ASPH scheme. This technical step is indeed the biggest hurdle that 
prevented us from directly using any of the previous lattice-based commitment 
schemes, such as [3,6, 13, 17,27,41]. 

An additional lattice-based technical ingredient employed in our construction 
is a relatively efficient group authentication (GA) scheme. A GA scheme aims to 
assign a certificate to group members, and enable the latter to prove their legiti- 
mate group membership via an interactive proof system. To this end, we extract 
a GA scheme from the lattice-based group signature of [14], which is arguably 
the most efficient option available to date?. As per Jarecki’s blueprint, we need a 
5/-protocol satisfying special properties for proving the relation capturing group 
certificate validity. However, due to the soundness gap of the protocol in [14], we 
are unable to prove the special soundness property on the same relation. Never- 
theless, we demonstrate that special soundness holds in a relaxed manner, i.e., 
it holds for a superset of the relation corresponding to certificate validity, and 
then show that this relaxation is sufficient for our application. We note that the 
security notion we achieve here is stronger than the notion of certificate unforge- 
ability considered in [24] - we refer to this property as strong unforgeability. Yet, 
the security of our construction relies on the same computational assumptions 
as in [14], namely, Module-LWE, Module-SIS, and NTRU. 

As a summary, the generic construction and the lattice-based realization we 
suggest here considerably depart from the specifications of Jarecki’s blueprint. 
We generalize the ideas of [24] and show that our modifications are sufficient 
to achieve covert mutual authentication in general and in the lattice setting, 
despite relying on somewhat weaker cryptographic ingredients. Our lattice-based 
protocol consists of 5 rounds and can be reduced to 3-round in the random oracle 
model. The scheme inherits efficiency features from the employed lattice-based 
building blocks [3, 14,25] without a significant change in parameters. 


ORGANIZATION. The rest of the paper is organized as follows. In Sect. 2, we pro- 
vide our definitions and model of covertness and covert mutual authentication 
(MA), as well as definitions of cryptographic ingredients needed for our con- 
structions: covert commitment schemes, approximate smooth projective hashing 
(ASPH), key reconciliation and group authentication (GA) protocols. In Sect. 3, 
we present our generic construction of covert MA. In Sect.4, we recall some 
necessary background on lattices and the computational assumptions we will 
employ. Then, in Sect. 5, we present our lattice-based ASPH scheme on covert 
commitment - which is a major technical building block for instantiating our 
construction of covert MA based on lattices. 


3 Note that we can extract a GA scheme from other existing lattice-based group 
signature systems, such as [29,31,33], but it would be much less efficient. 
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2 Cryptographic Definitions and Models 


2.1 Covertness and Covert Mutual Authentication 


Covertness. To define the covertness of two-party protocols, we assume that 
the protocol runs over a public channel with periodic message flow from some 
probability distribution 7, which is efficiently sampleable based on the public 
information of the protocol. A protocol is said to be covert if the communication 
between two parties can not be efficiently distinguished from the message flow 
in the public channel. This is only possible when the public channel is stegano- 
graphic, i.e., it has sufficient min-entropy. One example of steganographic chan- 
nels is a random channel where messages are randomly distributed over some 
finite range, as used in [24]. 


Covert Mutual Authentication. In this work, we are interested in (implicit) 
mutual authentication protocols based on the membership of a given group. Such 
a protocol allows two certified group members to establish a random shared key 
if they both honestly follow the protocol. 

A group involves a group manager (GM) and a polynomial (in security 
parameter 7) number of group members. A Mutual Authentication (MA) pro- 
tocol is a triple of algorithms (KG, CG, Auth). Algorithm KG(17) returns 
(mpk, msk), where msk (master secret key) is only known to the GM and mpk 
(master public key) is a public information. For group member with identity i, 
GM assigns a certificate sk; — CG(i, msk). For authentication between P; and 
Pj, both parties run interactive protocol Auth with P;’s input (mpk, (sk;,i)) and 
P,’s input(mpk’,(sk;,j)), and get keys K and K’ respectively. If mpk = mpk’ 
and if (sk;,7) and (skj, j) are valid group certificates under mpk, then K = Kk’, 
Otherwise (K, K’) are independent and uniformly random numbers. 

We say that an MA protocol is covert if it satisfies the properties of internal 
covertness and external covertness, defined as follows. 


1. Internal Covertness: There exists an efficiently sampleable distribution T, such 
that for any PPT adversary (excluding group manager and group members), 
acting as one of the parties in the authentication protocol, it is infeasible for 
the adversary to distinguish with non-negligible advantage whether the honest 
party is following the protocol or sending the messages generated according 
to distribution T. 


2. External Covertness: There exists an efficiently sampleable distribution T such 
that for any PPT adversary (including the group manager and group mem- 
bers), who does not have access to the randomness used in the execution 
of the protocol, it is infeasible for the adversary to distinguish with non- 
negligible advantage between the transcript generated by the valid execution 
of the protocol and transcript sampled according to distribution T. 


We define security games G and G for PPT adversaries A and A, denoted 
by Ga(17,b) and Gz(17,), respectively, where game G represents the internal 


covertness property and game G represents the external covertness property. 
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Adversary A only has access to the public parameter of the protocol. In terms 
of known information, adversary A is more powerful than A and has access to 
msk and the certificates sk;’s for all group members. Let u and u be sequences 
of random bits sampled from some fixed, efficiently sampleable distributions 7 
and 7, respectively. 


— Generate (mpk, msk) — KG(17). Let N := poly(r) be the number of group 
members and compute sk; — CG(msk, i) for i € [N]. 
— Game G4(17,b): 
1. Adversary A is allowed to make poly(T) number of calls to Exec(e). 
e Exec(i): Execute the Auth protocol with input (mpk, (sk;,7)), inter- 
acting with adversary A. 
2. Adversary A return identity i* of a group member. 
3. Adversary A is allowed to make only one call to Test(i*). 
e Test(i): If b = 1, then execute Auth protocol with input 


(mpk, (sk;,7)) interacting with adversary A, and send the local out- 
put K to A. Otherwise, send random message u sampled from the 


distribution 7 and send a random key to adversary A. 
4. When A halts and outputs a bit b*, the game outputs the same bit b*. 


- Game G x(17, b): 

1. A is given the key pair (mpk, msk) and certificates sk;’s for all i € [N]. 

2. A returns identities i* and j* of two group members. 

3. Adversary A is allowed to make only one call to ExtTest(2*, 7*). 

e ExtTest(i, j): If b = 1, then the challenger sends a transcript of an 

authentication protocol between group members 7 and j. Otherwise, 
the challenger sends a string ŭ sampled from a distribution T. 

4. A halts and outputs a bit b*. Game G outputs the same bit 6. 


Definition 1. An MA scheme (KG, CG, Auth) is said to satisfy the inter- 
nal covertness property if for any PPT adversary A, the advantage € = 
| Pr[G.4(17,0) = 1] — Pr[G4(17,1) = 1]| is negligible in 7. 

Definition 2. An MA scheme (KG, CG, Auth) is said to satisfy the exter- 


nal covertness property if for any PPT adversary A, the advantage € = 
|Pr[ĝ 307, 0) = 1] - Pr[§ 17, 1) = 1]| is negligible in T. 


2.2 Covert Commitment Schemes 


Let JI = (Gen, Com, Verify) be a commitment scheme with message space M. 
For security parameter \, algorithm Gen(A) generates the commitment public 
key e. For any message m € M, algorithm Com(m, e) computes the commitment 
c and witness r. To open the commitment c, given witness r and message m, 
verification algorithm Verify(c,r,m) outputs 1 for accept or 0 for reject. 

The standard security properties of commitment schemes are binding and 
hiding, which can be defined in the perfect, statistical or computational sense. 
Here, we require the covertness property, which says that for any message 
m E M, the distribution of commitment value c over the randomness r is indis- 
tinguishable from the uniform distribution over commitment space. Note that 
covertness is a stronger notion than hiding, i.e., the former implies the latter. 
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2.3 Approximate Smooth Projective Hashing 


We adapt from [26] the definitions of Approximate Smooth Projective Hash 
Function (ASPH). Let W and & be a binary relations on some sets ¥ and W, 
such that (4, W) D Y* D Y. Let IT = (Hash, PHash) be a pair of algorithms for 
6-ASPH scheme over relations W and Y*. Let Alice’s input be z4 and Bob’s input 
be (ap, w). Alice computes (pk, h) := Hash(a,4;7r) and sends the projection key 
pk to Bob. Bob computes the hash value h’ := PHash(pk, xg, w). It is a 6-ASPH 
scheme if it satisfies the following properties. 


— Completeness: If (x4, w) E€ W and x4 = xg then 
Pr[||h — h'||00 > 6] = negl. 


— Soundness: If (x4, w) ¢ W*, then (pk, h) is statistically close to uniform over 
the respective domain‘. 

— Covertness: There exists an efficiently sampleable distribution $(U,,) such 
that distribution of pk — Hash(z) for any x is computationally indistin- 


guishable from distribution $(Up,). 


2.4 Key Reconciliation Schemes 


The aim of a Key Reconciliation (KR) scheme is to generate a common secret 
if and only if Alice and Bob have “close by” secrets. Let q € Z* and 6 € RF. 
Suppose that Alice and Bob possess secrets dı and d2, respectively, such that 
dı is uniformly random in Z, and |dı — də| < 6. Then MH = (Encs, Dec;), 
where Alice and Bob run the algorithms Encs and Decs, respectively, is a key 
reconciliation scheme if the following properties are satisfied. 


— Encs(dı;r) computes the secret 7 and f such that distribution of (n, f) is 
indistinguishable from uniform in some given ranges of integers. 
— Decs(d2, f) computes the secret 7’. If |d1 — d2| < 6, then 7 = y. 


In this work, we employ the key reconciliation scheme from [25]. Let t := 
|log g| and b := [log ô|. The scheme proceeds as follows. 


— Eneg(di;7r): Let ry = 1 and roy, = For all j € ft] \ {8, 1r 1}, sample 


rj — {0,1}. Then compute f = +O ry mod q and 7 = > gi-b-2, 
j=0 j=b+2 


— Decs(d2, f): Compute 7’ = | ee ee a | 


By construction, the distribution of the pair (f,7) is indistinguishable from uni- 
formly random integers in ({q], [2°~°~? — 1]). We refer to [25, Section 3.2] for 
more details. 


4 In this work we use a relaxed soundness condition. We show that the Soundness 
property holds over the overwhelming proportion of instances. 
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2.5 Group Authentication Protocols 


Group Authentication (GA) can be viewed as an interactive form of group sig- 
natures, in which there is no opening authority who can break group members’ 
anonymity. A GA protocol allows Alice to convince Bob that she is a valid group 
member without revealing any additional information. 

A GA scheme is a tuple of algorithms (KG, CG, Ver, Ver*, Com, X). Let C 
be the challenge set and C := {c1 — c2\c1 # co € C}. Algorithm KG(A), where A 
is the security parameter, generates the group public key gpk and group secret 
key gsk, where gpk is a public information and gsk is the private information 
of the Group Manager (GM). Let S be the set of identities of group members. 
For any identity i € S, algorithm CG/(i,gsk) generates a certificate sk; for 
group member with identity i, such that Ver(gpk, (sk;,i)) = 1. Let WO4 be the 
committed certificate validity relation, 


ye — { ((gpk,C), (sk,i,r)) | Ver(gpk, (sk,i)) = 1 and C = Com(i:r) }. 


Let Ver* be a relaxed verification check, associated with a set C. Let ỌGA > yGA 
be the relaxed certificate validity relation, 


WA = { (gpk, (sk, i,c)) | Ver*(gpk, (sk,i,c)) = 1}. 


We call a GA scheme on relations WC4 and W@4 secure if it satisfies the 
following properties. 


1. Strong Unforgeablity: For any PPT adversary A, the probability that given gpk 
as input to A, can output (sk, i,c) — A(gpk) such that (gpk, (sk, i,c)) EW“, 
is negligible in A. 7 

2. Special-X Protocol: The relations (Y4, WYS4) admits a Special- X-protocol. 

3. Covertness of Commitment: The commitment scheme Com is covert. 


3 Covert Mutual Authentication: Generic Constructions 


In this section, we first describe a generic construction for covert Mutual Authen- 
tication (MA) schemes. To this end, we start with a Group Authentication (GA) 
scheme, then convert it into a covert MA scheme using an Approximate Smooth 
Projective Hashing (ASPH) with an associated covert commitment scheme and a 
Key Reconciliation (KR) scheme. Recall that Jarecki’s generic construction [24] 
uses an exact smooth projective hashing. Here, in contrast, we show that ASPH 
is sufficient for the design of covert MA. We note that our construction does not 
support the revocation of group membership, but it enjoys a stronger security 
guarantee than the MA protocol proposed in [24], namely, external covertness. 
We then instantiate our construction under lattice-based assumptions, using the 
technical ingredients we developed in the previous sections. 
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3.1 Our Generic Construction of Covert MA 


Our generic construction employs the following ingredients. 


— A GA scheme Hea = (KGea, CGea, Ver, Ver*, Come, X) with Special 
X-protocol X = (P1, P2, V) on relations YS^4 and WA defined upon certifi- 
cates generated by CG@u; 

— A 6-ASPH system II4spy = (PG,Com, Hash, PHash) with associated 
covert commitment scheme on relations ¥ and W*; 

— A KR scheme gpr = (Encs, Decs); 


— A collision-resistant hash function H. 


The scheme Jma = (KG, CG, Auth) then works as follows. 


— KG: Given the security parameter A, and the set of identities of group mem- 
bers S, compute (gpk, gsk) — KGga(A) and  — PG()). Set mpk = 
(gpk, 7) and msk = gsk. 

— CG(gsk, i): Generate a certificate (sk;) — CGga(gsk, i) for the group mem- 
ber with identity i € S. 

— Auth(i, j): P; and Pj follow the authentication protocol with inputs (sk;, t) 
and (skj, j), respectively. 


l. 
2, 


w 


P; computes C; — Comga(i, ski; ri) and sends C; to Pj. 

Let x; = (mpk, Ci) and w; = (ski, i,r). P; runs Special X-protocol X = 

(P1, P2, V) with input (2;,w,;) and P; with input 2;. 

(a) P; computes a; — P1(x;,w;;11) - the first message of the Y-protocol. 
Then, it computes a commitment to H(a;) as (b;) — Com(H(a,), r2) 
and sends b; to Pj. 

(b) When P; sends back a challenge c;, P; computes the second message 
zi — Po(x;, Wi, r1, Ci) and sends z; to Pj. 

(c) Pj computes a; = fy(xi, Ci, zi), (hi, pki) — Hash(b;, H(a;); r3) and 
(mi, fi) = Enes(hi; r4). It sends (pk;, fi) to P; and sets K; = mi. 

(d) P; computes hi = PHash(pk;, H(ai), r2) and sets K; = Decs (fi, hi). 


. P; computes Cj — Comga(j, skj; rj) and sends C; to P;. 
. Let z; = (mpk,C;) and w; = (skj, j,r;). Pj runs Special X-protocol 


X = (P1, P2, V) with input (xj, wj) and P; with input (x3). 

(a) Pj computes a; — P1(£;, wj; 15) - the first message of the X-protocol. 
Then, it computes a commitment to H(a;) as (b;) — Com(H(a;), re) 
and sends b; to F,. 

(b) Receiving challenge c; from P;, it computes the second message zj — 
P2(x;,w;,15,¢j) and sends z; to P;. 

(c) P; computes aj = fy(xj,¢;,2;), (hj, pkj) — Hash(b;,H(aj);r7) and 
(nj, fj) = Encs(hj; rg). It sends (pkj, fj) to Pj and sets K; = 7. 

(d) P; computes hi = PHash(pkj,H(a;),r6) and sets K; = 
Decs(fj, hj). 


— The final secret key for P; is K; 6 K; and for P; is Kj ® Kj. 
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Correctness. Assume that both P; and P; have valid group membership certifi- 
cates. First, by the special simulation property of the Special X-protocol, we get 
a, = a;. Next, by the correctness of the ASPH scheme, we have ||; — hi|| < ô. 
Then, by the correctness of the KR scheme, we obtain that K; = Kj. Similarly, 
we can show that K; = K;. Hence, in the end of the protocol, P; and P; share 
the same secret key. 


Theorem 1 (Internal Covertness). The scheme Ima = (KG, CG, Auth) 
satisfies the internal covertness property if Haa = (KGoa, CGaa, Ver, Ver", 
Comea) is a covert GA scheme, Taspu = (PG, Com, Hash, PHash) is a 
6-ASPH with associated covert commitment scheme and IR = (Encs, Decs) 
is a KR scheme. 


In the proof, we let $(Come,) be the distribution for the covertness of the 
commitment scheme Come, and $(Com) be the distribution for the covertness 
of commitment scheme Com. Let $(U;) be the uniform distribution over the 
range of f from Encs. Let $(2’) be the distribution over the response (z;) in X 
protocol. The distribution $(Upx) is as defined in Sec. 2.3.. 


Proof. As there is a symmetry in the authentication protocol, we assume that 
the adversary A plays the role of P;. Suppose that A can distinguish between 
Ga(17,0) and G4(17,1) with advantage £. Let G4(17,b,i*) be a game which 
follows G4(17,b) but if adversary queries Test(i) for i # i* then it halts and 
outputs 1. It is easy to see that there exists an identity i* for which adversary A 
distinguishes between Go = G.4(17,0,7*) and Gi = G.4(17,1,2*) with advantage 
at least ¢/N where N is the group size. In the rest of the proof, we will show 
that the distinguishing advantage between Go and G; is negligible by the games’ 
succession. 


Game G2: Let G2 be the game which follows Gi, except in all Auth(i, j) 
instances of Exec(i) and Test(i) queries, we modify by replacing P,’s message 
zi in step (2)(b) by a message sampled from distribution $(2’). Let G(t) be 
the game that follows Gz in the first t Exec queries while the remaining ones 
are as in G1. The only difference in Gi(t) and G(t — 1) is in the message (z;), 
and the covertness of Special X-protocol ensures that G1 (t) and Gı(t — 1) are 
indistinguishable. Hence G2 and G are indistinguishable. 


Game G3: Let G3 be the game which follows G2, except in all Auth(i, j) 
instances of Exec(i) and Test(i) queries, we modify by replacing P;’s message 
b; in step (2)(a) by a message sampled from distribution $(Com). Similarly, the 
covertness of Com implies that G3 and G2 are indistinguishable. 


Game G4: Let G4 be the game which follows G3, except in all Auth(i, j) 
instances of Exec(i) and Test(i) queries, we modify by replacing P,’s message 
C; in step (1) by a message sampled from distribution $(Comg,). Similarly, the 
covertness of Comg, implies that G4 and G3 are indistinguishable. 

Note that, in game G4, the response to Auth(i,7) instance of Exec() and 
Test(i) queries is sampled by $(Comea) in step (1), $(Com) in step (2)(a), and 
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$(X) in step (2)(b), and steps (3)-(4) depend only on the adversary’s response. 
Hence, game G; can be easily simulated using the public information. 


Game Gs: Let Gs be the game that follows G4 but in all Auth(i*, j) instance 
triggered by Test(i*), we replace P;’s message (pkj, fj) in step (4)(c) by uni- 
formly random elements from respective domains. Let € be the advantage by 
which the adversary can distinguish between Gs and G4. The only difference in 
these two games is in (f;,pk;) and from the property of KR scheme we know that 
if (hj, pkj) is uniformly random then (fj, pkj) is uniformly random. So, adver- 
sary A can distinguish between (fj, pkj) from Gs and G4 only if (hj, pkj) is not 
uniformly random distributed in game G4. For b;,z; and c; from game G4, if 
((b;, H(a’,)),¢) g W* where a} = fy(xj,c;,2;), then by the soundness property 
of the ASPH scheme, (hj, pkj) in game G4 is statistically indistinguishable from 
uniformly random string and (fj, pkj) is also statistically indistinguishable from 
uniformly random string. Let ¢4spyH be the negligible advantage adversary can 
have in this. Hence with probability £2 = €; — €AspH, a random interaction in 
game G4 with adversary yields (bj, cj, zj) such that ((bj, H(a;)), e) c W. We fix 
the adversary initial randomness and run the interaction twice until adversary 
outputs b; creates a fork. With atleast ¢3/2 probability, we get two transcripts 
(bj, Cj, Zj, Čj, Zj) such that a’ = fy(x;,c;,2;), @ = fv(a;,¢;,2;) and there exists 
r and f satisfy ((b;, H(a’)),r) € W* and ((b;, H(a@’)), T) € W*. With probability at 
least (1 — £3) (over public parameter of scheme Com), the commitment scheme 
is perfectly binding over relation ¥*, and it thus implies that H(a’) = H(a’). Let 
Ecol be the upper bound on the probability that the hash values of different a’ and 


a’ produce a collision. Hence with probability £4 = 3 — €3 — Ecol, adversary gets 
(£j, a’, Cj, Cj, Zj, Z;) such that Cj Æ Cj and V(zj, a’, Cj, zj) = V(zj, a’, Ch Zj) Sh 
By the special soundness property of Special X protocol, the adversary can 
extract w such that (xj,w) € WO4. If e4 is non-negligible, then it breaks the 
Strong Unforgeability of the scheme Haa. Hence, £4 is negligible, implying that 
£1 is also negligible, because €ASPH, Ecol and €3 are negligible. 


Game Gs: For each Auth(i, j) query triggered by Exec(i) in game Gs, samples 
C; are as from $(Comga). Game Ge exactly follows Gs, except that we revert 
this change by replacing C; — Com(i,sk;) and by a similar argument used 
between G4 and G3, we get that Gs and Ge are indistinguishable. 


Game G7: For each Auth(i, j) query triggered by Exec(i) in game Ge in step 
(2)(a), we sample b; from the distribution $(Com). In game G7, we revert this 
change by replacing b; — Com(H(a;)). By a similar argument as used between 
G3 and G2, we get that Ge and G7 are indistinguishable. 


Game Gs: For each Auth(i, j) query triggered by Exec(i) in game G7 in 
step (2)(b), we sample z; from the distribution $(X). In game Gg, we revert 
this change by replacing z; — Po(x;, wi,r1,c;). By a similar argument as used 
between G and G4, we get that Gy and Gx are indistinguishable. Note that 
game Gg is the same as Go and by succession of games we have shown that 
game Go and Gj are indistinguishable. 
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Theorem 2 (External Covertness). The given scheme Iya = (KG, CG, 
Auth) satisfies the external covertness property if Haa = (KGoa, CGaa, Ver, 
Ver*, Comga) is a covert GA scheme, Iaspy = (PG, Com, Hash, PHash) 
is a 0-ASPH with associated covert commitment scheme and IkR = 
(Encs, Decs) is a KR scheme. 


We defer the proof to the full version. 


Round Complexity. It is easy to see that step 1 and step 2(a) can be combined in 
one round. Similarly step 3 and step 4(a) can be combined. The Auth protocol 
can be executed in 5 rounds. In the first round, P; sends C; and b; to P;. In the 
second round, Pj sends c;, Cj and b; to Pj. In third round, P; sends c; and zi 
to Pj. In the fourth round, P; sends z; and (pk;, fi) to P;. In the fifth round, 
P; sends (pkj, fj) to P;. In the Random Oracle Model (ROM), the protocol 
can be executed in three rounds if c; and cj are computed as c; = H' (xi, bi) 
and c; = H'(x;j,bj) for a hash function H’ onto {0,1}7 modeled as random 
oracle. The only issue comes in Game Gs of Theorem 1 (Internal Covertness) 
where adversary fork two transcript with same commitment. By using the general 
forking lemma from [4], if adversary make almost qw hash queries then we 
get an algorithm that create the same two transcript with probability at least 


9° E2 ail 


a ). The rest of the proof of internal covertness follows as it is. 


4 Some Background on Lattices 


Let d > 0 be a power of 2 and q be a prime. Define the rings R := Z[X]/(X4+1) 
and Ry := Z,[X]/(X@ + 1). For any element z = Si zi X? € R, the l, norm 
of z, for 1 < p < œ, is defined as ||z||, := ©; Jap, while its Zs norm is 
defined as ||z||.. := max {|z;|}. To compute the norm of an element z € Rg, we 


use the unique representation where z; € [-, i] for each coefficient of z. 
The norm definition can be naturally extended to vectors over RE. 

We use lowercase bold letters to denote a column vector over Rq and upper- 
case bold letters to denote a matrix over Rq. For a vector æ, its i” coordinate 
is denoted by x;. For a matrix M, we denote by M; its j* column and by Mij 
the element at its it” row and j*” column. For any probability distribution D, we 


use notation x — D to denote that x is sampled with probability D(x). When 


S is a finite set, we use notation x È S to denote that x is sampled uniformly 
at random from S. For probability distributions ¥ and JY over a countable set 
S, we use A(X, YV) to denote the statistical distance between Y and Y which is 
defined as i 
A(X, Y) = 5 D> [Pri = 2] - Pry = al]. 
res 

For any @ € Ryo, we use Sg to denote the set of ring elements with infinity 
norm less than or equal to Ø, i.e., Sg = {a € R | |lalloo < 8}. We will use the 
following bounds [3,36]: 
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~ If ||flloo < 8 and |g]: < 7 then ||f - glloo < B+. 
- If ||fll2 < 6 and ||gll2 < y then ||f -gll < 87. 


We will use the following result about the factorization of a cyclotomic polyno- 
mial modulo a prime number. 


Theorem 3. /35, Corollary 1.2] Let d > k > 1 be a power of 2 andq =2k+1 
mod 4k is a prime. Then the polynomial X¢ +1 factors as 


k 
X?+1= [Ju —rj;) mod q 


j=1 


for distinct r; € Z4 \ {0}, where X*/* — r; is irreducible in Zq[X]. Furthermore 
any y € Z,[X]/(X4 + 1) that satisfies 0 < |lyllo < con has an inverse in 
Zq[X]/(X¢ +1). 


Discrete Gaussian: For any c > 0, k € Zso and y € RF, for all x € R*, define 
2 
Poy (£) := exp (Hrb), For any discrete set S C R*, we extend the definition 


2 
as Poy (S) = ecg P (Hgb), We use æ — Dk, to denote that 


Poy (£) 
Pr =U) := —s 
UNDE, [a | Poy (REY 


namely, æ is sampled from R* with probability proportional to po, (æ). We omit 
the parameter y when y = 0. We will use the following lemma from [2,3,34]. 


Lemma 1. For any ôo € R", k,d € Zt, 


kd _ §2 
Pr [Hill > doVkd | x — DE| < 5*4. exp (2). 


Computational Assumptions: We will work with a ring Rg = Zq[X]/(X¢ + 1) 
(where d is a power of 2), and security of our construction is based on the 
hardness of module variants [8,30] of the Short Integer Solution (SIS) problem [1] 
and the Learning With Errors (LWE) problem [38], as well as on the hardness 
of the NTRU problem [23]. For convenience, we define the M-SIS problem in 
the 42 and the Zœ norm, and the M-LWE problem only in the Z% norm. 
Definition 3. For any n,m,q € Z*, p € {2,00} and B € RY, the M-SIS, m.4.8 
problem is defined as follows: Given A 2 Ra”, find z € ae such that 
[In A] z =0 and 6 > |lz\|p > 0. 


Due to space constraint, we provide the reminder on other used computa- 
tional assumptions and rejection sampling techniques in the full version. 
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5 Approximate Smooth Projective Hashing from M-LWE 


In this section, we construct an Approximate Smooth Projective Hashing 
(ASPH) scheme with a covert commitment. We adapt the ideas used in the 
PAKE scheme by Katz and Vaikuntanathan [26], whose security relies on the 
LWE assumption. We observe that the encryption method used in [26] can also 
be seen as a commitment mechanism. 

In Sect. 5.1, we provide several technical lemmas. Using them, we construct 
an M-LWE-based covert commitment scheme in Sect. 5.2 and a 6-ASPH scheme 
in Sect. 5.3. 


5.1 Supporting Lemmas 


In this section, we assume that q is a prime satisfying q = 5 mod 8. We follow 
the technique from [30] to prove the following two lemmas. 


Lemma 2. Let q be a prime satisfying q = 5 mod 8. For B È RMK with 
yong q 
probability at most g&¢-%™/2(1 + 4-™4), we have: 
V4 


min ||Bs||,. < —. 
sER*\{0} 4 


Proof. First we calculate the probability of 0 < MiNseRk\ {0} |Bsllo0 < {a By 
the union bound, we get 


5 5 Pr (Bs=t)= bD 5 Il Pr (07's = ti) : 
IERP, sERKBORTX* LERI, — sERKi<mb; => RE 
0<||t loo <Va/4 0<|lt loo <Va/4 


From Theorem 3, we know that Xĉ + 1 factors into two irreducible polynomials 
fi = XV? — rı and fo = X4/? — rg in Ry. Hence by the Chinese Reminder 
Theorem (CRT), we have Rg ~ Fya/2 x Fya/2. The equality b? s = t; holds iff it 
holds for both the CRT components. If s is nonzero in a CRT component then 
the equation holds with probability at most q~?/? in that component. Notice 
that, if t; is non-zero then Theorem 3 implies that t; is also non-zero in both 
the CRT components as ||tillo < 4/4. As t Æ 0, it implies that s should be 
non-zero on both CRT components to satisfy bls = t; for i where t; 4 0. So the 
probability can be upper bounded by 


—\ dm 
= q d _— 
D Di ee 
tERT, sERkicm 
0<||tIlo0<Va/4 


Now we only need to bound the probability of minserk\ {0} ||Bs\|.. = 0. Notice 
that, s is non-zero in at least one of the CRT component. By a simple proba- 
bilistic argument, we can also bound this probability by q¥€q7™4/2. Hence the 
result follows. 
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Lemma 3. Let q be a prime satisfying q = 5 mod 8. Given B € Re, for 


gib+1)d—dm/24—md 


a È Ry with atmost probability, we have 


: q 
a+ Bs T aee va 
ZERO} SERE IKG yl 4 


Due to space restriction, we defer the proof of Lemma 3 to Full version. 
We additionally need the following result from [25]. 
Theorem 4. /25, Theorem 8] Let x€ N,£ >0, BE RXR and 
g > Vlde) If mingers\ {oy | Bs|loo > x then A(f* B,U) < 2e where 


x 
f — (Dro) and U is uniform distribution over RISK, 


Let M := (hh) C RG. We will require the following lemma to prove 


the binding property of our commitment scheme. 


Lemma 4. For all but an at most 2~-™ fraction of (ao, A1, A2) over (RT x 
RP” X RERE, there does not exist (c, m, r, z, M*, r*,z*) € (RP x Mx RE x 
Rq x Mx RË x Rq) such that m # m*, and 


max {lll z(c — ao — Am) — Arlo} < YE 


and 


* * * * q 
max {l]zlos [l2"(€ = ao — Arm”) — Agro} < YE. 


Proof. Let A’ := [ao Aj]. Fix some c,m,m* such that m 4 m*, and let 


y= e-ay—Aim=c-A'|, 


m 
and 
* * aie 
y* := c — ao — AÅım* =c- AÁ a ; 
Let fı = aah. fo= aah, , where X?/2—r; and X%/?—rp are irreducible 


factors of X¢—1 over Zp as stated in Theorem 3. From the description of message 
space M, we get that m £ m’ mod fı andm 4m’ mod fo. 


Asm 4 m*, we get that hal and Be are linearly independent. Therefore, 


for a uniformly random choice of ag and Aj, we have that y and y* are uniformly 
random and independent. 
Let Ey be the event that min SERË ZER4 yz + Aəs|lo < V/q/4 and E2 
s.t. 0<||z||<Vaq/4 
be the event that min SERE,2*ERq I[y*z + Aəsļļ < y4/4. From Lemma 3, 
s.t. 0<|[z*||<Vaq/4 
we get Prao A, [E1 and Eo] < q? tDd-md , g—4md, 
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Now, using the union bound over c,m,m*, we deduce that, with at most 


"ies tnd . q t1)d—md | 2 4md < querer lad . g—4md < goma. 


probability over the uniform choice of (ao, A1, A2) over (RẸ x RP*" x RPXE), 
there exists (c, m, r, z, Mm*,r*,z*) such that m 4 m* and 


lz(c— ao ~ Arm) ~ Arlo < “4, 0 < Izle < £ 
and 


J 


l2" (e — ao — Arm*) — Ar" llo < YF, 0 < lle" < fe 


5.2 Covert Commitments from M-LWE 

Let us first describe the commitment scheme. 

— PG(A): Given the security parameter A, choose k,n € Zt, m > (k+n+ 
1)logg € Z, B < VG/4 € Rt, ao È RP, Ar È RX”, and Ap © RM, 
Let M = (l) cps. 


— Com(m;r,e): For a message m € M, sample vectors r £ Ro and e + Sg. 
Output the commitment 


Com(m;r,e) = c= ao + Aım + Apr +e. 


— Ver(c,m,r,z): Output 1 if ||z(c — ag — Arm) — Aorllo < va ZER 


q> 
0 < ||Z|loo < 4/4, and m € M, otherwise output 0. 


Covertness of the commitment (which implies the computational hiding prop- 
erty) directly relies on the M-LWEm ,k,q,8 assumption. We get the statistical 
binding property as a corollary of Lemma 4. 


5.3 6-ASPH Scheme 


We construct a 6-ASPH scheme on relations 


y := {((e,m), r, 1) jee RJ m EM, r RE, lle ao — Aim — Avr || < B} 
6 and 
Ww = {((e,m),r,z) [cE RP meM,re RE, Ver(c, m,r, z) = 1} ; 


5 We choose such a message space M to make sure that there does not exist m, m’ € 
M such that m 4 m’ but either m = m’ mod fı or m = m’ mod fz where 


j= a f = qs Here X%/? — rı and X?’ — rg are irreducible 


factors of X? — 1 over Zp as stated in Theorem 3. We are using this condition in 
Lemma 4. 


6 Set of commitment, message and witness generated by an honest party. 
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— The public parameters consist of @ € Rt, o > 4,/qlog(2d(1 + 1/e))/m and 
ô := B(m+1) -ov2d. 


- Hash(c, m; f): Given commitment c and message m, first sample f — 
D+! then compute the hash value h = f” ((c— ao — Aım)?,1)" and 


T 
output the projection key pk := (FAF 0)") 
— PHash(pk,m,r): Given the projection key pk, message m and witness r 
for commitment c, compute the hash value as W = pk? - r. 


Correctness. Assume that we are given c, a commitment to message m with 
witness r, i.e., |c — ao — Aim — Aor|loo < 8. This implies that 
h- W = fT ((c- ao — Arm), 1)’ — f7(AP 0)? r 
= fT ((e — ao — Aim = Aor)’, 1)" . 
Let fT = (fi,---;fm+i). As vector f is from a Gaussian distribution, by 
Lemma 1 with probability at least (1 — 27%/7)™+1 > 1 — (m + 1) - 27%/7, we 


have Vi € [m+ 1], ||f:l2 < oV2d. It implies that, with probability at least 
(1 — m- 274/7), it holds that ||h — A’ lloo < B(m41)-aV2d = ô. 


Soundness. Let c be a commitment and let message m be such that there does 
not exist (r, z) such that Ver(c,m,r, z) = 1 i.e. 


Y(r,z) € a : ||z(c— ao — Arm) — Ar loo > VG/4 or ||Z|]o0 ¢ (0, /a/4]. (1) 


We want to show that (h, pk) = Cas ((¢ — a- Aym)’, 1)" (fF (Ae 0)7)7) 
is statistically indistinguishable from R¥+*. Let B := [A; t] € RAX EEI where 
A = [A7 0j? and t = ((e- ao — Aim)? 1)”. Lemma 2 implies that with 
probability at least (1 — Z= we have 


Ys € RG \ {0}, ||Azslloo > VG/4 i.e. Ys € Rg \ {0}, lAzsll > vg/4. (2) 


Therefore, from Eq. 1 and 2, we get the Ys € REHH\{0}, ||Bs|lo0 > /q/4. Hence 
Theorem 4 implies that A( fT B,U) < 2e, where U È Re 


Covertness. From Eq. 2 and Theorem 4, we get that A( fT AL,U) < 2e where 


u È RE. Hence, the covertness property follows. 
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Abstract. In the shuffle model for differential privacy, n users locally 
randomize their data and submit the results to a trusted “shuffler” who 
mixes the results before sending them to a server for analysis. This is 
a promising model for real-world applications of differential privacy, as 
several recent results have shown that, in some cases, the shuffle model 
offers a strictly better privacy/utility tradeoff than what is possible in a 
purely local model. 

A downside of the shuffle model is its reliance on a trusted shuffler, 
and it is natural to try to replace this with a distributed shuffling pro- 
tocol run by the users themselves. While it would of course be possible 
to use a fully secure shuffling protocol, one might hope to instead use a 
more-efficient protocol having weaker security guarantees. 

In this work, we consider a relaxation of secure shuffling called differ- 
ential obliviousness that we prove suffices for differential privacy in the 
shuffle model. We also propose a differentially oblivious shuffling proto- 
col based on onion routing that requires only O(n log n) communication 
while tolerating any constant fraction of corrupted users. We show that 
for practical settings of the parameters, our protocol outperforms exist- 
ing solutions to the problem. 


Keywords: Differential privacy - Onion routing 


1 Introduction 


Differential privacy [19] has become a leading approach for privacy-preserving 
data analysis. Traditional mechanisms for differential privacy operate in the 
curator model, where a trusted server holds all the sensitive data and releases 
noisy statistics about that data. To reduce the necessary trust assumptions, 
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researchers subsequently proposed the local model of differential privacy. Here, 
each user applies a local randomizer R to its sensitive data x; to obtain a noisy 
result y;, and then forwards y; to a server who analyzes all the noisy data it 
obtains. A drawback of local mechanisms is that, in some cases, they provably 
require more noise (and hence offer reduced utility) than mechanisms in the 
curator model for a fixed level of privacy. For example, computing a differentially 
private mean of n users’ inputs can be done with O(1) noise in the centralized 
curator model [19] but requires 2(,/n) noise in the local model [5,13]. 

A recent line of work has explored an intermediate model that provides a 
tradeoff between these extremes. In the shuffle model [4,8,16,36], users locally 
add noise to their data as in the local model, but also have access to a trusted 
entity S (a “shuffler”) that anonymizes their data before it is forwarded to the 
server. That is, whereas in the local model the server obtains the ordered vector 
of noisy inputs (y1,---,Yn), in the shuffle model the server is given only the 
multiset {yi} := S(y1,..-,; Yn) which hides information about which element was 
contributed by which user. (The {y;} can be encrypted with the server’s public 
key before being sent to the shuffler so the shuffler does not learn the value 
submitted by any user.) Balle et al. [4] analyze the result of composing a local 
differentially private mechanism with a shuffler, and show a setting where the 
shuffle model offers a strictly better privacy/utility tradeoff than what is possible 
in the local model. 

Although the shuffle model relies on a weaker trust assumption than the 
curator model, it may still be undesirable to rely on a trusted shuffler who is 
assumed not to collude with the curator. It is thus natural to consider replacing 
the shuffler by a distributed protocol executed by the users themselves. Clearly, 
using a fully secure shuffling protocol to instantiate the shuffler preserves the pri- 
vacy guarantees of the shuffle mode. However, fully secure distributed-shuffling 
protocols are inefficient in practice (see Sect. 1.1). 


Our Contributions. We consider a relaxation of oblivious shuffling that we 
call differential obliviousness. (Prior work has considered the same or similar 
notions in other settings; see Sect. 1.1.) Roughly, for any honest pair of users 
and any pair of values y, y’, a differentially oblivious shuffling protocol hides (in 
the same sense as differential privacy) whether the first user contributed y and 
the second user contributed y’, or vice versa. Generalizing the results of Balle 
et al. [4], we analyze the privacy obtained by composing a local differentially 
private mechanism with any differentially oblivious shuffling protocol, and show 
that such shuffling protocols suffice to replace the trusted shuffler. 

With this result in place, we then seek an efficient differentially oblivious 
shuffling protocol. In the context of anonymous communication, Ando et al. [1] 
show a differentially oblivious shuffling protocol using O(nlogn) communica- 
tion.' Their protocol is based on onion routing, in which each user routes its 


1 Ando et al. consider a “many-to-many” variant of shuffling, where each of the n users 
wants to send a message to a distinct recipient, in contrast to our setting where all 
n inputs are sent to a designated receiver. Nevertheless, their results can be applied 
to our setting with minor modifications, so we ignore the distinction. 
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message to the server via a path of randomly chosen users, with nested encryp- 
tion being used to hide from each intermediate user everything about the route 
except for the previous and next hops. Ando et al. analyze the privacy of onion 
routing against an adversary who corrupts some fraction of the users in the net- 
work in addition to the server, and who is also assumed able to eavesdrop on 
all communication in the network. While such an adversary may be appropri- 
ate in the context of using anonymous communication to evade state-sponsored 
censorship, we believe it is overkill for most deployments of differential privacy 
that could benefit from the shuffle model. Instead, we consider a weaker adver- 
sary who can only monitor the communications of corrupted users, and analyze 
the differential obliviousness of onion routing in this model. Our analysis uses 
very different techniques from those of Ando et al., and results in better con- 
crete parameters as well as an asymptotic improvement in the average per-user 
communication complexity. 

As in the work of Ando et al., we can adapt our protocol to handle a malicious 
adversary by routing dummy messages alongside real ones and checking partway 
along the route whether any dummy messages have been dropped. Focusing on 
the application to the shuffle model, we observe that the overall privacy degrades 
smoothly if only a few (real) messages are dropped—a dropped message is similar 
to having one less user—and thus a secure protocol only needs to abort when 
many messages are dropped by the adversary. As a consequence, we are able to 
address malicious behavior with lower overhead (compared to the semi-honest 
setting) than Ando et al. 


1.1 Related Work 


Secure Shuffling. There is a long line of work studying secure shuffling proto- 
cols. We survey some of what is known, restricting attention to protocols secure 
against t = O(n) corruptions. 

Fully secure shuffling can be done via secure computation of a permutation 
network [24,32], or by having t + 1 parties sequentially shuffle locally [24,29]. 
Either approach requires R(n?) communication. While it is possible to improve 
the asymptotic communication complexity to O(nlogn) by using O(log n)-size 
committees (cf. [9,17,31]), the concrete efficiency of that approach is unclear. 

Movahedi et al. [31] considered a relaxed version of shuffling in which security 
may fail completely with probability O(1/n3); this can be viewed as a form of 
differential obliviousness. The communication complexity of their protocol is 
O(n - polylogn). Their protocol and that of Ando et al. [1] (discussed earlier) 
are the only practical protocols for shuffling we are aware of with sub-quadratic 
communication complexity. 

Bell et al. [6] proposed a different approach for achieving a relaxed form 
of shuffling. Their construction requires O(n?) communication, which can be 
improved to O(n logn) for constant size input domains. To the best of our knowl- 
edge, it has the best concrete efficiency of any prior shuffling protocol. They are 
also motivated by applications to the shuffle model, but do not prove that their 
relaxation provides differential privacy when composed with a local differentially 
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private mechanism. Their protocol does not provide a smooth tradeoff between 
privacy and performance as our approach does. 

We provide a concrete comparison between our shuffling protocol and prior 
work in Sects. 4.4 and 5.1. 

In roughly concurrent work, Biinz et al. [10] propose a differentially oblivious 
shuffling protocol that relies on a very strong form of trusted setup. 


Anonymous Communication. Sender-anonymous communication can be 
used to implement oblivious shuffling. DC-nets [15] and mix networks [14], two 
classical approaches for anonymous communication, both require R(n?) commu- 
nication for security against a constant fraction of corrupted parties. 

Backes et al. [3] proposed a security definition for anonymous routing inspired 
by differential privacy, and Kuhn et al. [25] gave a definition of security (sender- 
message pair unlinkability) nearly identical to our own definition of differential 
obliviousness. Neither of these works show new protocols realizing their defini- 
tions. Several recent anonymous communication systems [27,34,35] also define 
security in terms of differential privacy, but the per-user communication com- 
plexity of these systems is 2(n). None of these works consider how anonymous- 
communication protocols compose with other differentially private mechanisms. 

Bellet et al. [7] study “gossip” protocols that provide differential privacy. The 
model they consider is quite different from ours, and they focus on one-to-many 
communication rather than many-to-one communication as we do here. 

The onion routing protocol [1,21,33] that we study in this paper is used 
as part of the Tor anonymous communication network (though Tor uses paths 
with only three intermediate nodes). Although Tor has received a lot of attention 
in the security community, most of that work focuses on active attacks and/or 
attacks that are specific to Tor. While some theoretical analyses of the anonymity 
provided by onion routing exist [1,2,11,18,20, 26,28], none (other than the work 
of Ando et al. [1]) prove differential obliviousness. 


Differentially Private Computation. The idea of relaxing security for dis- 
tributed protocols in the context of differential privacy has appeared in a number 
of prior works [5,12,22,23,29,30]. Beimel et al. [5] first proposed the idea, and 
studied how the relaxation impacts efficiency for the problem of secure sum- 
mation. He et al. [23] and Groce et al. [22] construct differentially private set- 
intersection protocols that are more efficient than fully secure protocols for the 
same task. Mazloom and Gordon [29], and Mazloom et al. [30] leverage differen- 
tial privacy to make graph-parallel computations more efficient. Chan et al. [12] 
consider a version of differential obliviousness (defined differently from ours) in 
the client/server model, studying sorting, merging, and range-query data struc- 
tures under that relaxation. 


2 Definitions 


Differential Privacy. We use the standard notion of (approximate) differential 
privacy. Two vectors of inputs x = (£1, ..., £n) and x’ = (x4,...,x/,) are called 
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neighboring if they differ at a single index; i.e., if there exists an index i such 
that x; A x; but x; = a; for j # i. Let f denote a randomized process mapping 
a vector of inputs (#1,...,2%,) E D” to an output in some range R. We say that 
f satisfies (€,6)-approzimate differential privacy if for all neighboring vectors 
x,x’ € D” and subsets R’ C R we have 


Pr[ f(x) € R'] < e - Prif(x’) € R] + 6. 


If f satisfies (€,0)-approximate differential privacy then we simply say that f is 
c-differentially private. For compactness, we abbreviate these as (€,0)-DP/e-DP. 


Local Differential Privacy and the Randomized Response Mechanism. 
In the setting of local differential privacy (LDP), each user U; applies a random- 
ized function œR to their own input x; and then sends the result y; to an untrusted 
server. Translating the guarantees of differential privacy to this setting, we say 
that R is (e€, ô)-LDP if for all x,’ € D and R’ C R we have 


Pr[R(x) € R] < ef - Pr[R(2’) € R'] +ô. 


If R is (e,0)-LDP then we simply say that R is e-LDP. 
Let y € (0,1) be a parameter, and let D denote a discrete domain in which 
users’ inputs lie. The randomized response mechanism R4,p is defined as 


x with probability 1— vy | 
y+ D with probability y  ’ 


Ry,p(£) = { 


i.e., with probability y a user replaces its input with a uniform value in D, and 
with the remaining probability leaves its input unchanged. It is not hard to show 
that if y > |D|/(e. + |D| — 1) then R.,,p is «LDP. 


The Shuffle Model. In the shuffle model [4,8,16,36] each user U; computes 
yi — R(a;) as in the local model, but then sends y; to a trusted “shuffler” S. 
After receiving a message from all n users, S outputs the multiset (which can 
also be viewed as a histogram) h = {y;}. If we overload notation and let S also 
denote the process of mapping a list of elements to the multiset containing those 
elements, then R defines the randomized process 


def 


Son" È So(R x- xX Riles 8n) = S (Ra) ees R(£n)). 


Balle et al. [4] showed that under certain conditions the shuffle model 


improves the privacy of an LDP mechanism.? 


Theorem 1. Let R be an «-LDP mechanism. If € < log(n/log(1/8))/2, then 
So RS” is (e',6)-DP with e = O(min{1, e€} - e€/log(1/d)/n). 


For the particular case of randomized response they show 


Theorem 2. Fiz values n,€,ô, and D. Ify > max { ee eee le \, then 
So RSh is (€,5)-DP. 


? For clarity, we state a slightly looser bound than what they prove. 
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Differentially Private Protocols. More generally, we may consider interactive 
protocols executed by a server and n users, each of whom initially holds an 
input x;. The server has no input, and is the only party to generate an output. 
We say that a protocol IT implements a (randomized) function f if the honest 
execution of JI when the users hold inputs 71,...,2@n, respectively, results in the 
server generating output distributed according to f(a1,...,2n). 

In this setting, the server’s view may contain more than just its output. It 
is also natural to consider that some of the users executing the protocol may 
themselves be corrupted and colluding with the server. (For simplicity, in what 
follows we assume semi-honest corruptions; i.e., we assume corrupted parties— 
including the server—follow the protocol as directed, but may then try to learn 
additional information based on their collective view of the protocol execution. 
The definitions can be extended in the obvious way to handle malicious behav- 
ior.) Given a set of parties A (that we assume by default always includes the 
server), we let VIEW 7,4 (£1,.--,£n) be the random variable denoting the joint 
view of the parties in A in an execution of protocol JI when the users initially 
hold inputs 71,...,2%,. Let H denote the set of users not in A; let xa denote the 
inputs of users in A; and let xg denote the inputs of users outside of A. Then: 


Definition 1. Protocol IT is (€,6)-DP for t corrupted users if for any set A con- 
taining the server and up to t users and any xa, the function mapping xy to 
VIEW 7,A(XA,XH) is (€,0)-DP, i.e., for any neighboring xn, x}, and any set V 
of possible (joint) views of the parties in A, we have 


Pr[VIEW77,4(Xa, XH) € V] < e - Pr[ViEWn,a (XA, XH) € V| + ô. 


The above can be relaxed to computational DP as well. 

One can also consider protocols operating in a hybrid world. The shuffle 
model is a special case of this, where the parties have access to an ideal function- 
ality S implementing the shuffler. Concretely, the protocol (Ryp X =+ x Ry, pe 
corresponding to the randomized response mechanism is the one in which each 
user locally computes y; — Ry, p(x) and then sends y; to S, which sends the 
result {yi} := S(yi,---, Yn) to the server. The fact that some of the users them- 
selves might be corrupted, however, now needs to be taken into account. For 
example, the following is a corollary of Theorem 2: 


Corollary 1. Fix n,t,¢,6, and D. If y > max { epee) ae then 


(Ry,p X +++ xX Ryp)? is (€,0)-DP for t corrupted users in the S-hybrid model. 


Shuffle Protocols. A protocol X run by n users and a server is a shuffle protocol 
if it implements S, i.e., if the output generated by the server when running » is 
the multiset consisting of the users’ inputs. We are interested in shuffle protocols 
that ensure differential privacy when used to implement the shuffle model. Note, 
however, that we cannot use differential privacy to analyze a shuffle protocol; no 
shuffle protocol is differentially private, since two neighboring inputs y, y’ lead 
to different outputs. Instead, we use a related definition that we call differential 
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obliviousness. Call vectors y, y’ nethgboring if they differ by a transposition, i.e., 
there exist i,j such that y; = yj, yj = yi, and yp = yx for k ¢ {i,j} (so y’ and 
y are identical except the elements at positions i, j are swapped). Then: 


Definition 2. Shuffle protocol X is (e,0)-differentially oblivious for t corrupted 
users if for any set A containing the server and up to t users, any ya, any 
neighboring yn, yYş, and any set V of possible (joint) views of the parties in A, 


Pr[VIEWy a(ya, yn) € V] < ef - Pr[viEWysa(ya, yp) € V| + ô. 


3 Distributing the Privacy Blanket 


Generalizing the result of Balle et al. [4], we show that a differentially oblivious 
shuffle protocol suffices for implementing the shuffle model. Specifically: 


Theorem 3. Let X be a shuffle protocol that is (€,6)-differentially oblivious for 
t corrupted users, and let R be an €9-LDP mechanism. For any 6’ such that 
co < log((n — t)/log(1/6’))/2, protocol (R®")~ is (e+ ¢,6 + &')-differentially 
private for t corrupted users, where € = O(max{1, ceo} - e° /log(1/6’)/(n — t)). 


We prove the above in the full version of our paper; here, we focus on the 
particular case of randomized response. We show: 


Theorem 4. Let X be a shuffle protocol that is (€,6)-differentially oblivious for 
t corrupted users. If (Ry,p X X Ryd)” is (e', 0’)-differentially private for t 
corrupted users, then (Ry,p X +++ X Raa” is (e+e, 6+0')-differentially private 
for t corrupted users. 


Overview of the Proof of Theorem 4. Throughout this section, we let IT 
denote Rp X++- x R+y,p; our goal is to prove differential privacy of H ~ We pro- 
vide a formal proof starting in the next subsection; here, we provide an overview. 

Fix some neighboring inputs x = (xA, Xg) and x’ = (xa,x},), and some 
set of adversarial views V. (Each view in V includes the views of the server 
and t corrupted users in an execution of I7~’.) Conceptually, we separate each 
view v € V into three components: a component vı reflecting the adversary’s 
view of the input to X (in particular, vı includes the randomized inputs ya of 
the corrupted parties); the final multiset h output by the server (which has the 
same distribution as the multiset that would be output by the shuffler in HS 
conditioned on v1); and the view v2 that results from execution of X itself. 

For some first component vı and output multiset h, let Y(v1,h) denote the 
set of (possibly modified) honest inputs yy to X that are consistent with 11, h, 
and x, and let Y’(v1, h) denote the set of yg consistent with v1, h, and x’. Using 
Corollary 1 and letting m = n — t, we show (cf. Lemma 1): 


y Priv; | x] - Pr [ROB (x) € Y(v1,h) | vı] 
(vi,h):(v1,h,v2)EV 


<e D o Prior | x] Pr [REB(x!) € Y'a, h) | va] +4". 0) 
(v1,h) : (v1,h,v2)EV 
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(Note that Pr[v, | x’] = Pr[vi | x] since vı only depends on the true inputs of the 
corrupted parties.) For vı, h as above, let Vo(v1,h) = {v2 | (v1, h, ve) E V}. In 
what is the most technical part of the proof, we then use differential obliviousness 
of X to show (cf. Lemma 5) that for any v1, h we have 


Pr v E€ Vo(v1,h)] < e£- Pr və E€ Vo(v1,h)] + ô. 2 
ap pE VASE Pr plo € Vah) (2) 


The proof of the above follows from a combinatorial analysis of the two sets Y 
and Y’. Recall that an element in Y and an element in Y’ are neighboring if 
they differ by a single transposition. Differential obliviousness of X guarantees 
that neighboring vectors give rise to (roughly) the same view. If we can establish 
a bijection between Y and Y’, mapping each element of Y to a neighboring ele- 
ment in Y’, Eq. (2) would follow immediately. Unfortunately, Y and Y’ do not 
necessarily have the same size, and so such a bijection may not exist. Neverthe- 
less, we show how to extend Y and Y’ to multisets [Y] and [Y’] (by duplicating 
certain elements) having the same size, and so that the resulting multisets pre- 
serve the probabilities of each vector (so sampling uniform yy € Y gives the 
same distribution as sampling uniform yy € [Y], and similarly for Y’ and [Y’]). 
We then show that there is a bijection ¢ : [Y] — [Y’] such that yn and ¢(yu) 
are neighboring. This allows us to prove that Eq. (2) holds. 
Since 


Pri(v1,h,v2) EV |x]= YS) Pr[(vi,h,v2) | x] 
(v1,h,v2)EV 


= 5 Pr[vy |x] - Pr[R°B(x) € Y(v1,h) | va] 
(v1,h): (v1,h,v2)EV 


. Pr [v2 € Vo(v1, h)], 
yu+-Y(vi,h) 


combining Eqs. (1) and (2) allows us to prove Theorem 4. 


3.1 Notation and Preliminaries 


We now formalize the preceding intuition. We assume t users are corrupted and 
let m = n — t be the number of uncorrupted users. Fix some neighboring inputs 
x = (Xa,XH) and x’ = (xa,x},), and for i € [m] let xq, be the input of the 
ith honest user. Without loss of generality, we assume xy and x}, differ on the 
input of the mth user, and further assume that 24, = 1 and TH m = 2. 


The Adversary’s View. We now make explicit the components of the adver- 
sary’s view in an execution of IJ~ on input x. The first component of the view, 
which we denote by v1, includes ya = (Ry,p X ++: X Ry,p)(Xa), i.e., the adver- 
sary’s inputs to X. Following Balle et al. [4], we also include in vı the vector 
b = (bi,..., bm) indicating which of the honest users’ inputs are replaced by a 
random value, i.e., if b; = 0 then yy, = cy, and if b; = 1 then yy D. The 
second component of the adversary’s view is the multiset h = S(ya, yu) output 
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by X, in which y = (ya, ym) denotes the vector of inputs the parties provide 
to X; note that parts of yy (corresponding to inputs that have not been random- 
ized) can be deduced from vı. The third component v2 of the adversary’s view 
consists of the entire view of the adversary in the execution of X on inputs y. 
(Although v2 determines h, we find it useful to treat h separately.) 

For the rest of the proof, fix some set of views V = {(v1,h, v2)}. Note that 
views for which bm = 1 are equiprobable regardless of whether the honest inputs 
are XH Or X; therefore, we assume without loss of generality that all views 
in V have bm = 0. We let V’ = {(v1,h) | duo : (v1,h, v2) E V} and, for any 
(v1, h) € V’, we let V2(v1, h) = {v2 | (v1, h, v2) € V}. 

For some fixed v1, h, let Y (v1, h) denote the set of honest inputs yy consistent 
with v1, h, and x. That is, Y (v1, h) contains all yg € D™ such that (1) for all 
i with b; = 0, we have yx; = £y; (so, in particular, Yg,m = LH, m = 1), and 
(2) S(ya, yn) = h (where ya is fixed by v1). Similarly, we let Y’(v1, h) denote 
the set of yy consistent with v1, h, and x’. 


3.2 Step 1: Using Local Differential Privacy of R,,p 
Lemma 1. If IS is (€,6')-DP fort corrupted users, then for any set of views V 
and any pair of neighboring inputs x, x’, we have: 
5 Pr[v; | x] - Pr [REB (x) € Y(vı,h) | vı] 
(u1,h)EV’ 
<e“. 5 Pr[v | x’) - Pr RER € Y'(vı,h) | vı] +s. 
(vi, ,h)EV' 


The proof is given in the full version. 
We also state a useful corollary. Define 


A(v1,h) & 


max {Pr{[RE3 (x) € Y(v1,h) | va] — e“ - Pr[R°B(x’) € Y'(vı,h) | vı], o} . 


Using the fact that Pr[v, | x] = Pr[vı | x’], we then have: 
Corollary 2. If HS is (e',ô')-DP for t corrupted users, then for any set of 


views V and any pair of neighboring inputs x,x', it holds that: 


XO Prin |x]; A(vi,h) < ð. 
(vi, h)EV’ 


3.3 Step 2: Using Differential Obliviousness of X 


In this section we fix some (v1, h) € V’, and write Y, Y’, and V2 for Y(v1,h), 
Y'(vı, h), and Vo(v1,h), respectively. For simplicity, we assume both Y and Y’ 
are non-empty; the case where one or both are empty can be addressed by 
Lemma 1. Recall that if yg € Y then yy, = 1, and if yy € Y’ then yy m = 2. 
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Let h denote the multiset that remains after removing from h the multiset 
given by the elements of ya and the multiset {xu; | bj = 0,i 4 m} (both of 
which are determined by v1). Let cı be the number of 1’s in h, and let cy be 
the number of 2’s in h; note that c1, c2 #0 since Y and Y’ are non-empty. The 
following characterizes the relative sizes of Y and Y” in terms of cı and co: 


lY] cı 


Lemma 2. Y] =o 


Proof. Let C be the number of ways of distributing all the elements of h that 
are not equal to 1 or 2 among the honest users who have changed their inputs. 
A vector yy is consistent with v1, h, and x only if a 1 is associated with the last 
user, and the remaining c, + c2 — 1 elements of h that are 1 or 2 are distributed 
among the cı + c2 — 1 users who remain from those who have changed their 


inputs. Thus, 
-1 
Y|=C. es | 
Cj = 1 
Siilarly, 


— 1 
piec ), 
Cg — 1 


The lemma follows. 


Lemma 3. For every yu € Y, there are cp vectors in Y’ that result from trans- 
posing the final entry of ym with some other entry of ym. Similarly, for every 
Yu € Y', there are cı vectors in Y that result from transposing the final entry 
of Yg with some other entry of yt- 


Proof. We prove the first statement; the second follows symmetrically. Fix some 
yu € Y. The final entry of yy is 1, and there are cz other entries of yy that 
are equal to 2 and that correspond to users who have changed their inputs. 
Transposing the final entry of yg with the entries at any of those locations gives 
a vector in Y”. 


Mapping Between Y and Y”. Ideally, we would like to construct a bijection 
between Y and Y’ such that a vector in Y is mapped to a vector in Y’ iff they 
are transpositions of each other. Then for each pair of such vectors yu and yyy, 
we could argue that VIEWs,4(ya,ynH) and VIEWs,4(ya, Yy) must be “close” 
by differential obliviousness of X. Unfortunately, as shown in Lemma 2, the 
cardinalities of Y and Y’ might be different, so such a bijection might not exist. 
To resolve this issue, we “duplicate” vectors in Y and Y’ so that the resulting 
multisets [Y] and [Y’] have the same cardinality. Concretely, we let [Y] be a 
multiset consisting of c2 copies of each element yy € Y. Similarly, we let [Y’] be 
a multiset consisting of cı copies of each element y}; € Y’. Note that sampling 
uniformly from [Y] (resp., [Y’]) is equivalent to sampling uniformly from Y 
(resp., Y’). Moreover, by Lemma 2, [Y] and [Y’] have the same size. We show: 


Lemma 4. There is a bijection ¢: [Y] — [Y"] such that for every yy € [Y], the 
vector ġ(y g) € [Y’] is a transposition of yx. 
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Proof. Consider the bipartite graph G with vertex sets [Y] and [Y’], where there 
is an edge between yy € |Y] and yy € [Y] iff y'y results from transposing the 
final entry of yy with some other entry of yy. Using Lemma 3 and the fact that 
every vector in Y’ is included cı times in [Y’], we see that each yy € [Y] has 
exactly c1 - cp edges. Reasoning analogously, each y’, € [Y"] has cı - co edges. 
Hall’s marriage theorem implies that G has a complete matching, which is also 
a perfect matching since [Y] and [Y’] have the same size. Any such matching 
constitutes a bijection ¢ as claimed by the lemma. 


Recall that the third component of the adversary’s view, v2, is equal to 
VIEW y,4(Ya, yn). We may now prove the main result of this section. 


Lemma 5. If X is (€,6)-differentially oblivious for t corrupted users: 


Pr. [VIEWS a(ya, yu) € V2} < ef- Pr [views a(ya, yH) € V2] + ô. 
yu-Y Yu- Y’ 


Proof. Let ¢: [Y] — [Y’] be a bijection as guaranteed by Lemma 4. Differential 
obliviousness of X implies that for any yg € [Y]: 


Pr [viewy,a(ya, yn) € Va] < e - Pr[viewy,a(ya, ¢(yu)) € V2] + ô. 
Recalling that [Y] and [Y’] have the same size, we thus have 


Pr. [VIEWs,4(ya, YH) € V2] = Pr [VIEWS a(ya,yu) € V2] 
yn-Y yo [Y] 


7 Pr [VIEWy,4(ya, yn) € V2] 
=a P 


ef - Pr[ VIEWS (ya, (yH)) € V2] +ô 
>, I[Y]| 


IA 


ef - Pr[|VIEWS AYA, YH) € Vo] + ô 
[Y] 


II 


Yy €[Y] 
=ef. Pr [VIEWS A(ya, Yy) € V2] + ô. 
Yy- Y” 
Combining Corollary 2 and Lemma 5 allows us to prove Theorem 4. Details 
are given in the full version. 


4 A Differentially Oblivious Shuffle Protocol 


In this section, we describe a construction of a differentially oblivious shuffler. 
We present the protocol in Sect. 4.1 and analyze its obliviousness (for a semi- 
honest adversary) in Sects. 4.2 and 4.3. We compare its concrete performance 
to relevant prior work in Sect. 4.4. We defer a discussion of how to deal with 
malicious behavior to the full version. 
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Inputs: Each user 7 has input yi. 


Round 1: Each user chooses r — 1 users i1,...,%r—1 <— [n] uniformly 
and independently, and then forms the onion encryption Cr as 
described in the text. It sends C, to user 71. 

Rounds ¢ = 2,...,r — 1: For each ciphertext Cr—e+2 received in the 
previous round, compute (ie, Cr—e+1) := Decsk;, (C;--e+2) and 
forward Cr—e+1 to user ie. 

Round r: For each ciphertext C2 received in the previous round, 
compute (S, C1) := Decsk; (C2) and forward C; to the server S. 

Output: S initializes h := Ø. Then, for each ciphertext C received in 
the previous round, compute y := Decx,(C) and add y to h. 


Fig. 1. A differentially oblivious shuffling protocol, parameterized by r. 


4.1 A Shuffling Protocol 


Recall that in our setting we have n users holding inputs yj,...,Yn, respec- 
tively, who would like a server (that we treat as distinct from the n users) to 
learn the multiset h = {y;}. We assume the parties have public/private keys 
(pk,,ski),...,(pk,,,sk,), respectively, and that the server has keys (pkg, skg). 
Our protocol, which is based on onion routing [21,33], works as follows. Let r be 
a parameter that we fix later. Each user U chooses r — 1 users i1,...,%-—1 — fn] 
uniformly and independently (it may be that U chooses itself), and then forms 
a nested (“onion”) encryption of the form 


Cr = Encpk,, (i2, Encpk,, (is, ° +- (ir—1, Encpk, (S, Encpks (y))) =- )), 


such that at each “layer” the identity of the next receiver is encrypted along 
with an onion encryption whose outer layer can be removed by that receiver. In 
the first round, U sends C, to the first receiver i1, who decrypts to remove the 
outer layer and thus obtains i2 and an onion encryption Cy—1 that it forwards 
to i2 in the next round. This process continues for r — 1 rounds, until in the rth 
round all parties send the ciphertext EnCpks (y) they have obtained to the server. 
(We assume a synchronous communication network.) See Fig. 1. 

The protocol requires r rounds of communication, and the total number of 
ciphertexts transmitted is exactly rn. Since ciphertexts have length O(r logn), 
the total communication complexity is O(r?n log n). 


4.2 Analysis of Obliviousness (e€ = 0) 


We assume a semi-honest adversary who corrupts up to t users as well as the 
server S. The attacker has access to the state of any corrupted user, and can 
also determine which user sent any message that it received. However, we assume 
the attacker cannot eavesdrop on the communication between honest users, so 
in particular it cannot tell whether some honest user i sent a message to some 
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other honest user j in some round. We treat encryption as ideal in our analysis 
of obliviousness in order to simplify our treatment. 

Assume without loss of generality that users U,,U2 are honest and hold 
different inputs, and fix input vectors y and y’ that are identical except the 
inputs of U; and U2 are swapped. Let i; denote the /th intermediate user chosen 
by U, for 1 < £ < r—1, and set i} = 1; define i2,...,i?_, similarly. (We let 
round 0 refer to the beginning of the algorithm when U; and U2 each hold their 
own input.) Say that U and Uz can swap at round j (with 0 < j < r—1) 
if the routing paths of U; and U2 both have an honest user in rounds j and 
j+1 (i.e., for which users i;,i},,,i7, and 74, , are all honest). A key observation 
is that if there exists some j such that U, and Uz can swap at round 7 then 
the distributions on the attacker’s views are identical regardless of whether the 
input vector is y or y’. The reason for this is that it is equally likely that the 
onion encryption of U was routed from i} to aj 4, and that of U2 went from 
i; to 17,4, or that the communication was “flipped” (in which case we say the 
swap happened) so that the onion encryption of U, was routed from i; to iĝ}; 
and that of U> went from 7+ to i},,. In other words, if there exists some j such 
that Uı and U2 can swap at round j, then perfect obliviousness is achieved. If 
we let £r denote the probability of this event in an execution of the protocol 
with parameter r when up to t users are corrupted, we have: 


Theorem 5. The protocol in Fig. 1 is (0,1 — x+4,-)-differentially oblivious for t 
corrupted users. 


Our problem is now reduced to lower bounding qtr. Let py = (1 — t/n)? be 
the probability that U; and U2 both choose an honest user in some fixed round 
j > 1 when ¢ users are corrupted. By definition, we have xı = 0, and £2 = py 
since both U; and U2 are honest in round 0. By conditioning on the outcomes of 
the final two rounds, we can derive the following recurrence relation for r > 2: 


Ttr = p? + (1 p:i): Tt r—1 + Pi (1 — pi) Lr. 


Although it is possible to solve this recurrence, it is cleaner to simply bound £t, r 
for any desired t, r. The following can be proved by induction on r: 


Theorem 6. For r > 1, it holds that tn 3, > 1 — 0.85". Thus, forr > 1 the 
protocol of Fig. 1 is (0, 0.85")-differentially oblivious for n/3 corrupted users. 

For r > 1, it holds that x,/2 > 1 — 0.95". Thus, for r > 1 the protocol of 
Fig. 1 is (0, 0.95")-differentially oblivious for n/2 corrupted users. 


4.3 Analysis of Obliviousness (e€ > 0) 


We show here an alternate analysis that allows us to prove (e,6)-differential 
obliviousness for e > 0. (This analysis is incomparable to the analysis of the 
previous section since, for fixed r, we may obtain larger e but smaller ô.) 

We focus again on the case where we have input vectors y and y’ that are 
identical except that the inputs of honest users U; and U2 are swapped. The 
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observation we rely on here is that even if there is no round j where U; and U2 
can swap at round j, it is still possible to achieve some privacy if their inputs 
can be swapped via some other honest users. For example, say there is an honest 
user Uz and 0 < j < j’ < j” < r— 1 such that (1) U; and U3 can swap at 
round j, (2) Uz and U3 can swap at round j’, and (3) Uı and U3 can swap at 
round j”. Then the following events lead to the same view for the adversary: the 
input vector was y and none of the swaps happens; the input vector was y and 
(only) swaps #1 and #3 happen; or the input vector was y’ and all three swaps 
happen. This gives some privacy (given a view consistent with these events, the 
adversary cannot determine with certainty whether the input was y or y’), but 
the privacy is not perfect: since each swap is equally likely to happen or not, 
conditioned on the adversary’s view being consistent with the above input y is 
twice as likely as input y’. In this particular example the level of privacy obtained 
is relatively low, but privacy improves as more honest users can potentially be 
involved in the swaps. 

In the full version we give a more detailed analysis of the €,6 parameters 
obtained by considering swaps between multiple honest users; here we simply 
describe the qualitative conclusions of the analysis. Say U, and U> are swap- 
compatible if there are 0 < j < j) < j” < r—1 such that (1) the routing 
path of U; has an honest user in rounds j and j + 1 as well as rounds j” and 
j” +1, and (2) the routing path of Uz has an honest user in rounds 7’ and j’ +1 
(or the similar event with the roles of U; and U2 interchanged). If U1, U2 are 
swap-compatible then U; can potentially swap with some other honest users at 
round j, other honest users can potentially swap with U2 at round 7’, and then 
U, can again potentially swap with other honest users at round j”. For that 
to occur requires other honest users who can potentially swap with U1, U2 at 
the appropriate rounds; roughly speaking, the more honest users can swap with 
U1, U2, the higher privacy will be achieved for U1, U2. 

Let 6, denote the probability that U,,U2 are not swap-compatible. Next, fix 
some desired value for e > 0. When U,,U2 are swap-compatible, we can derive 
a lower bound m on the number of other honest users that need to be able to 
swap with U,,U2 (we do not define this event more formally here) to ensure 
privacy bound e. Letting 62 be the probability that there are fewer than m other 
honest users who can swap with U;, U2, we can then conclude that our protocol 
achieves (e€, 6, + 62)-differential obliviousness. Note that 6, depends only on the 
corruption threshold and the number of rounds r, and decreases exponentially 
with r as in the € = 0 case. On the other hand, 52 also depends on the total 
number of parties n as well as the privacy parameter e (since decreasing € requires 
increasing m, which in turn increases the probability 52 of failing to have m other 
honest users who can swap with U4, U2). 


4.4 Performance Analysis 


To analyze the performance of our protocol and compare it with prior work, 
we assume encryption is done using the KEM-DEM paradigm, with the KEM 
portion having a length of 256 bits. We allocate 20 bits for user identities, which 
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suffices for up to n = 220 users,’ and we assume users’ inputs are 128 bits long. 
The innermost ciphertext thus requires 256 + 128 = 384 bits, and in each of the 
other layers we add 256 bits for the next key encapsulation plus 20 bits for the 


user ID. An r-layer onion ciphertext thus requires 384 + 276(r — 1) bits. 
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Fig. 2. Round complexity and per-user communication complexity for achieving € = 0 
and different 6 for various corruption thresholds, assuming 20-bit user IDs. 


The e = 0 case. In Fig. 2, we give the number of rounds and per-user commu- 
nication complexity needed to achieve (0, 6)-differential obliviousness for several 
values of 6 and various corruption thresholds. Note that these results are inde- 
pendent of the number of parties n. Our results compare favorably to prior work 
of Movahedi et al. [31], especially when the number of parties is large. In partic- 
ular, for a corruption threshold of t ~ n/3 the protocol of Movahedi et al. [31] 
uses 500 rounds and communication of 128 MB per user when n = 33,000, and 
approximately 0.5-1 GB over 1,000 rounds when n = 10°. 

Additionally, note that 6 is often set to be 1074 > 6 > 107° in the differential 
privacy literature. Using that range of values, we require r ~ 55-83 with n/3 
corrupted users, and our per-user communication cost is reduced to 53-119 KB. 


The € > 0 case. In Fig. 3, we show how 6 = 06; + 62 relates to n, r, and t, 
and e. Specifically, in Fig. 3(a) we show how the round/communication complex- 
ity depends on 61, and in Fig. 3(b) we show how e varies with 69. 

We can use these figures to determine how to set parameters. For example, 
say we have n = 12,000 users and up to t = n/3 corruptions, and want to 
determine the ô achievable for « = 1. From Fig.3(b) we see that 52 ~ 277%. 
Using Fig. 3(a), we see that 43 rounds suffice for 6; ~ 27?3. Thus, the protocol 
is (1, 2~?2)-differentially oblivious with 43 rounds. Assuming 20 bits for the user 
IDs, this corresponds to per-user communication of 32 KB. 


3 In fact, these identifiers are the only part of our construction that contribute to the 
O(log n) multiplicative factor in the overhead. 
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Fig. 3. (a) Round complexity and per-user communication complexity for achieving 
different 6, for various corruption thresholds, assuming 20-bit user IDs. (b) e vs. 62 for 
various corruption thresholds and different n. 


5 Malicious Security 


We briefly discuss how to address malicious attacks affecting privacy; denial-of- 
service attacks and other attacks that affect correctness are out of scope. If the 
encryption scheme used by the protocol is non-malleable, and timestamps and 
identifiers are included in each layer of the onion to prevent replay attacks [11], 
then the only attack an adversary can carry out on the protocol of Sect. 4 is to 
drop messages to reduce the effective number of honest users contributing to the 
output histogram and thereby degrade privacy (cf. Corollary 1). 

As in the work of Ando et al. [1], we can address such an attack by having hon- 
est users (1) route dummy messages alongside their real messages, (2) check part- 
way through the shuffling that their dummy messages have not been dropped, 
and (3) abort the protocol if malicious behavior is detected. Compared to the 
work of Ando et al., however, we can achieve security against malicious behav- 
ior with much lower overhead, both because we assume the adversary cannot 
eavesdrop on communication between honest users and also because we focus 
on the eventual application of our protocol to the shuffle model. With regard to 
the latter point, note that although dropping even a single user’s input can be 
catastrophic for differential obliviousness of a shuffling protocol (e.g., if y and 
y’ are input vectors that differ by a transposition of the inputs of users 1 and 2, 
and the input of user 1 is dropped), dropping a few users’ inputs has only a 
small effect on end-to-end differential privacy when the shuffle protocol is used 
to instantiate the shuffle model. Concretely, let Sq represent an ideal shuffler 
that is identical to S except that the adversary can select d honest users whose 
messages are dropped. The following is a natural extension of Corollary 1: 


Lemma 6. Firn,t,d,¢,6, and D. Ify > max { alosta, a TPL. then 


(Ry DX: xX Ry,p)>4 is (€,0)-DP for t corrupted users in the Sq-hybrid model. 


It thus suffices to realize Sq for small d. We describe our approach for doing so 
somewhat informally, and leave a detailed analysis for the full version. Let r,s 
be two parameters. At a high level, our modified protocol has four stages: 
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1. Each user U; runs the onion-routing protocol from Sect. 4 twice, in parallel. 
It sends its real input y; to the server using r + s — 1 intermediate hops, 
and sends a random dummy value to a randomly selected user R;—called 
a “checker”—using r — 1 intermediate hops. Appropriate padding is used to 
make sure the onion encryptions are indistinguishable. 

2. After round r, each user U; asks R; to respond with the random dummy value 
chosen by U;. If Ri responds with the correct value, then U; sets cheat; := 0; 
otherwise, it sets cheat; := 1. 

3. The users run a protocol to determine whether any user set cheat = 1. (We 
discuss below how this can be implemented efficiently.) If so, they all abort 
and do not run the next phase. 

4. Parties run the onion-routing protocol on the remaining real messages. 


The overall argument for why this preserves privacy is as follows. Prior to 
round r, the adversary cannot distinguish real onion encryptions from dummy 
onion encryptions. Setting parameters appropriately, we can ensure that if a 
malicious adversary drops d or more of the honest users’ onion encryptions before 
round r, then with high probability at least one of those will correspond to a 
dummy message associated with an honest checker; in that case, cheating will be 
detected and all honest users will abort. This, in turn, means that the real input 
of an honest user will be completely hidden from the adversary by the onion 
encryption unless the final s intermediate users chosen by that honest user for 
the onion-routing of its real message are all corrupted. The probability that this 
occurs for some honest user is at most n- (t/n)*. 

The above shows that if the honest users do not abort by round r, then at 
most d of the honest users’ real messages were dropped before round r. We can 
thus claim privacy at round r, with the number of honest messages being at 
least n — t — d, just as we did in Sect. 4. Nothing prevents the adversary from 
dropping as many messages as it likes after round r, but doing so cannot degrade 
the privacy already achieved by round r. 


Efficient Implementation of Stage 3. In stage 3 we need a distributed pro- 
tocol with the property that if any honest user holds cheat = 1 then all honest 
users output 1. While this can be achieved using n executions of secure broad- 
cast, doing so would be inefficient and is overkill for our purposes; in particular, 
it is acceptable for us if the adversary causes disagreement among the honest 
users. We propose the following lightweight protocol that can be based on any 
multisignature scheme. Every user who holds cheat = 0 sends a signature on 
some designated message M to the server. The server then combines these sig- 
natures into a single, constant-size signature, and sends it to every user. Each 
user locally verifies the signature it receives from the server with respect to every 
users’ public key, and outputs 1 if verification fails (or if it does not receive any 
signature from the server). Note that even if all-but-one of the users are cor- 
rupted, an adversary cannot forge a valid multisignature on M unless every 
honest party held cheat = 0. 
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5.1 Performance Analysis 


We analyze the communication overhead of the malicious protocol relative to the 
semi-honest protocol for the same privacy guarantees. Using dummy messages 
incurs roughly 2x overhead compared with the semi-honest protocol using the 
same number of rounds. (For simplicity, we do not count the communication in 
stages 2 and 3 which is anyway dominated by the onion routing. In fact, since 
dummy messages are not routed in stage 4, the communication overhead is less 
than 2x of the semi-honest protocol with the same number of rounds.) However, 
since the total number of rounds must be increased in the malicious setting, the 
overall communication overhead is higher. (Note that the total communication 
complexity is quadratic in the number of rounds since the length of each onion 
encryption is linear in the number of rounds.) 


Zero €. For the values of t,ô in Fig.2, we need to set s equal to anywhere 
from 5% to 52% of r. This results in a total communication overhead of 2.2— 
4.6x compared to the semi-honest protocol. 


Non-zero e. For the parameters in Fig.3, we need to set s equal to anywhere 
from 30-80% of r. This results in a total communication overhead of 3.4-6.4x 
compared to the semi-honest protocol. 


Comparison to Prior Work. For n = 1,000, 000 users, t = n/3, and to achieve 
(0, 2~*°)-differential privacy, our malicious protocol requires r + s = 212 rounds 
and 1.5 MB communication per party. In comparison, for 1,000,000 parties and 
t = n/5, we estimate* Bell et al. [6] costs 12 rounds and communication of 
199 KB per party. While the performance of our protocol is inferior, we note that 
in practice, often worse privacy parameters are chosen, and our protocol would 
then out-perform that of Bell at al. For example, if (1.25, 2~?°)-differential pri- 
vacy suffices and t = n/3, our per-party communication cost reduces to 169 KB 
using only 70 rounds. If (0.454,2~?°)-differential privacy suffices and t = n/5, 
our per-party communication cost reduces to 70.9 KB using only 45 rounds. 

Finally, if a DO shuffle is used in applications beyond the privacy blanket, 
we compare even more favorably when the input domain size is larger than 
O(n'/3). Specifically, our communication cost per party grows logarithmically in 
the domain size, while theirs either grows linearly in the domain size, or super 
linearly in n. 
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Abstract. Bootstrapping parameters for the approximate 
homomorphic-encryption scheme of Cheon et al., CKKS (Asiacrypt 17), 
are usually instantiated using sparse secrets to be efficient. However, using 
sparse secrets constrains the range of practical parameters within a tight 
interval, as they must support a large enough depth for the bootstrapping 
circuit but also be secure with respect to the sparsity of their secret. 

We present a bootstrapping procedure for the CKKS scheme that com- 
bines both dense and sparse secrets. Our construction enables the use of 
parameters for which the homomorphic capacity is based on a dense secret, 
yet with a bootstrapping complexity that remains the one of a sparse secret 
and with a large security margin. Moreover, this also enables us to easily 
parameterize the bootstrapping circuit so that it has a negligible failure 
probability that, to the best of our knowledge, has never been achieved 
for the CKKS scheme. When using the parameters of previous works, our 
bootstrapping procedures enable a faster execution with an increased pre- 
cision and lower failure probability. For example, we are able to bootstrap 
a plaintext of C°?" in 20.2 s, with 32.11 bits of precision, 285 remaining 
modulus bits, a failure probability of 27138-7, and 128 bit security. 


Keywords: Fully Homomorphic Encryption - Bootstrapping - 
Implementation 


1 Introduction 


1.1 The CKKS Scheme 


The CKKS scheme by Cheon et al. [10] is a leveled ring learning with errors 
(R-LWE) [23] homomorphic-encryption scheme that enables approximate arith- 
metic over vectors of complex numbers. Since its introduction, this scheme has 


J.-P. Bossuat, J. Troncoso-Pastoriza—Part of this work was carried out at EPFL. 


© Springer Nature Switzerland AG 2022 
G. Ateniese and D. Venturi (Eds.): ACNS 2022, LNCS 13269, pp. 521-541, 2022. 
https://doi.org/10.1007/978-3-031-09234-3_26 


522 J.-P. Bossuat et al. 


grown in popularity, as it is currently the most efficient for performing encrypted 
floating-point arithmetic. Ciphertexts are tuples of Rg = Ze[X]/(X*% +1), with 
the main cryptographic parameters being the polynomial-ring degree N and its 
modulus Q; for a given security parameter À and fixed N, an upper bound on 
Q can be derived (a smaller Q leads to a more secure instance). 

A fresh CKKS ciphertext is of the form (co, c1) = (—as + m + e,a) € Rẹ for 
a a random polynomial, s and e low-norm secret polynomials and m a message 
polynomial. The decryption is obtained by evaluating ((co, c1), (1, s)) = Mm +e. 

A message m is encrypted at the modulus Q (maximum level) and each 
subsequent multiplication consumes a level and reduces the size of the modulus 
Q. Hence, the upper bound on Q fixes the maximum homomorphic capacity of 
fresh ciphertexts (the maximum circuit’s depth). Once a ciphertext reaches its 
smallest possible modulus q, it can be bootstrapped back to a larger modulus, 
thus enabling the evaluation of arbitrary-depth circuits. 


1.2 Bootstrapping 


The bootstrapping procedure for the CKKS scheme was first proposed by Cheon 
et al. [8] and can be summarized in four steps: (i) ModRaise: raise the ciphertext, 
currently at its smallest modulus q, back to its highest modulus Q. (ii) Coeff- 
sToSlots: homomorphically evaluate the canonical embedding r. (iii) EvalMod: 
homomorphically evaluate a modular reduction, approximated by the scaled sine 
function q/(27) - sin(27x/q). (iv) SlotsToCoeffs: homomorphically evaluate T~. 
The procedure outputs a ciphertext at modulus Q’ with Q > Q’ > q, the differ- 
ence between Q’ and q being the residual homomorphic capacity after the boot- 
strapping. Cheon et al. evaluate 7 with a matrix (plaintext) x vector (encrypted) 
multiplication and compute the scaled sine function using the Taylor series of 
e? followed by an extraction of the imaginary part to retrieve sin(x). 

Extensive works have since improved the efficiency of the original procedure 
of Cheon et al. The first improvement was proposed by Chen et al. [4]. In their 
work, they improved the efficiency of the homomorphic evaluation of 7 by mul- 
tiple orders of magnitude by adopting an FFT-like approach instead of a single 
matrix-vector multiplication. They also proposed a more efficient polynomial 
approximation by directly approximating sin(x) with a Chebyshev interpolant. 
In a concurrent work Cheon et al. [6] proposed a similar technique to improve 
the evaluation of T. 

These works were followed by efforts aimed at improving the homomorphic 
modular reduction, which is the most difficult step of the bootstrapping since 
the CKKS scheme does not support the evaluation of non-polynomial functions. 
Han and Ki [13] proposed a polynomial interpolation that takes into account the 
distribution of the message and uses a scaled cosine to enable the double angle 
formula, which allowed them to greatly reduce the degree of the interpolant. 
Lee et al. [19] proposed a modified multi-interval Remez algorithm to find the 
optimal minimax approximation of the scaled sine/cosine functions, and used 
the inverse sine function to remove the error introduced by the approximation of 
the ideal modular function by a trigonometric function. Lee et al. [20] proposed 
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a polynomial interpolation that minimizes the variance of the interpolant, thus 
reducing the error introduced by the homomorphic evaluation of the polynomial. 
Jutla and Manohar [15] proposed a novel variant of the Lagrange interpolation 
that allowed them to directly approximate the modular reduction function, with- 
out having to rely on trigonometric functions. They also proposed [16] to use a 
sine series to approximate the modular reduction and achieved a much higher 
precision than the previous works. Lee et al. [21] proposed a polynomial approx- 
imation method for the modular reduction based on the L2-norm minimization. 
Similarly to Jutla and Manohar, their technique allows them to avoid using 
trigonometric functions and directly approximate the modular reduction. 

Bossuat et al. [3] proposed a more efficient algorithm to evaluate general 
linear transformations and a polynomial evaluation algorithm that preserves the 
ciphertext scale and does not introduce rescaling errors, as well as several other 
smaller improvements to the bootstrapping procedure. They show that, when 
combined, these improvements lead to a bootstrapping an order of magnitude 
more efficient than the previous works. Additionally, they proposed the first 
practical instance of a bootstrapping with a dense secret as well as the first 
open source implementation [18] of the bootstrapping for the full-RNS (Residue 
Number System) variant of the CKKS scheme [7]. Finally, Yu and Hayato [14] 
proposed a more efficient way to evaluate the trace function. 

Put together, these works improved the bootstrapping procedure to be orders 
of magnitude more efficient and precise than the original proposal by Cheon et 
al. However, the bootstrapping circuit has fundamentally remained unchanged 
since its first introduction. One of its limitations is its high sensitivity to the 
density h of the secret s: the larger h is (the more non-zero elements s has), the 
more complicated the EvalMod step is and the higher depth the bootstrapping 
requires. The density h also has an impact on the bootstrapping failure probabil- 
ity. Indeed, the magnitude of the plaintext coefficient on which the homomorphic 
modular reduction must be applied is a function of h, and if a single coefficient 
falls outside of the approximation interval (a ciphertext typically encrypts 214 
to 216 values), the bootstrapping procedure fails and returns unusable values. 
Bossuat et al. [3] observed that commonly used bootstrapping parameters have a 
high failure probability and it is only recently that works have started to quantify 
and mitigate this failure probability. 

For these reasons, bootstrapping procedures are instantiated with sparse 
secrets (with small h). But recent improvements on attacks targeting sparse 
secrets [9, 11,24] have reduced the upper bound on the modulus Q; consequently, 
parameters using sparse keys must be regularly updated. Being able to mitigate 
this dependency on sparse secrets would therefore be an important step for the 
adoption and practicality of CKKS bootstrapping. 


1.3 Our Contributions 


In this work, we propose a sparse-secret encapsulation technique for the CKKS 
bootstrapping; the technique improves the CKKS bootstrapping security and 


524 J.-P. Bossuat et al. 


efficiency by taking advantage of the security margin provided by using evalu- 
ation keys at a small modulus. Our main contributions can be summarized as 
follows: 


Minimized Security-Dependency on Sparse Secrets. The leveled property 
of the CKKS scheme is tightly related to its security, as the security of an R- 
LWE sample is notably based on the size of its modulus Q. For a fixed ring 
degree N and a security parameter A, an upper bound for Q is derived and the 
public keys are generated using this Q. Although R-LWE samples at modulus 
Q have security A, previous works did not take into account that elements at a 
lower level have a proportionally smaller modulus, hence a larger security. 

We propose a modification to the bootstrapping circuit that enables the 
generation of all evaluation keys using a secret that is independent from the 
one on which the complexity of the EvalMod step is based. Instead of the usual 
single secret instance, our bootstrapping instance uses an additional ephemeral 
secret that determines the complexity of the bootstrapping procedure. As such, 
the maximum modulus Q does not depend anymore on the sparse secret that 
defines the complexity of the EvalMod step, hence a denser secret can be used 
to generate the evaluation keys. 

This construction has a two-fold benefit: (i) It increases the flexibility to choose 
bootstrapping parameters, such as ones with a denser secret and larger homomor- 
phic capacity, as the security of all evaluation keys at the maximum modulus is 
based on a different secret than the one defining the circuit complexity. (ii) The 
dependency on the sparse secret is minimized by limiting it to an evaluation key 
at the smallest modulus; we show that this evaluation key also has a large security 
margin against attacks that target sparse (and dense) secrets. 


Negligible Failure Probability. By having the EvalMod step rely on a 
ephemeral sparse secret of low h, we are able to greatly reduce the complexity of 
this step, allowing for a higher precision and a much lower failure probability. In 
fact, we can easily make this failure probability smaller than the security param- 
eter 27^, which has never been achieved before, to the best of our knowledge. 


Empirical Experiments and Open Source Implementation. We evalu- 
ate our contribution with empirical experiments and provide an open source 
implementation of this work in the Lattigo library [18]. 

The rest of the article is organized as follow: In Sect. 2, we introduce the 
used notation and recall the necessary background for the CKKS scheme and 
its bootstrapping; In Sect. 3, we present our core contributions; In Sect. 4, we 
give the security argument that our modification does not introduce any new 
security assumption and examine the security of our construction for concrete 
parameters; In Sect. 5, we empirically analyse the impact of our contribution 
on the noise and bootstrapping precision. In Sect. 6, we evaluate our contribu- 
tion with empirical experiments and discuss the implications of our modified 
bootstrapping procedure. 
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2 Background 


In this section, we introduce the notation used in the rest of this paper, as well 
as the necessary technical background related to our contribution. 


2.1 Notation 


For N, a fixed power of two, let Rg = Zg[X]|/(X* +1) be the cyclotomic poly- 
nomial ring over the integers modulo Q with coefficients in [—|Q/2]|,|Q/2]). 
Define Y = XN/?” for n some power of two smaller than N (a polynomial in Y 
is a polynomial in X with zero at coefficient degree that are not a multiple of 
N/2n). We denote single elements (polynomials or numbers) in italics, e.g., a, 
and vectors of such elements in bold, e.g., a. We denote a the element at posi- 
tion 2 of the vector a or the degree-i coefficient of the polynomial a. We denote 
||-|| the infinity norm, [-]g, |-|, L] the reduction modulo Q, rounding to the pre- 
vious and to the closest integer, respectively (coefficient-wise for polynomials), 
and (-,-) the inner product. 

We define the following distributions over Rg: xq has coefficients uniformly 

distributed over Zo. Xa has coefficients uniformly distributed over {—1,1} and 
exactly h non-zero coefficients. Xo has coefficients distributed according to a cen- 
tered discrete Gaussian distribution with standard deviation o. Unless otherwise 
specified, ø is assumed to be 3.19 (Homomorphic Encryption Standard [1]) and 
sampled values are truncated to [—|6a], |6a]]. We denote the act of sampling a 
polynomial from a given distribution x by — x. 
An R-LWE distribution is parameterized by the tuple {N,Q,h,o} and is 
sampled as (—as + e,a) € Rd with s — yn, a — xQ and e + Xo. We say 
that a parameter set {N,Q,h,o} is A-secure if the advantage of an adversary 
A to distinguish between the distribution (—as + e,a) € Rd and the uniform 
distribution U(RQ) is bounded by 27>: 


Adva = Pry eA = 1] — Pr Aa) = 1] < 27, 


2.2 Approximate Homomorphic Encryption (CKKS) 


A CKKS plaintext is a polynomial m(Y) € Zo[Y]/(Y?” +1) (with XN/2" = Y). 
We define the following plaintext encoding: 


— The coefficient encoding, for which the message m € R?” is directly encoded 
on Zg[Y]/(Y?" + 1) as m(Y) = |Am|, for A a scaling factor. 

— The slot encoding, for which the message m € C” is subjected to the canon- 
ical embedding 7 : C” — R?”, which preserves the coefficient-wise complex 
arithmetic. The coefficient encoding is then applied to encode the result on 
ZglY]/(¥2”" + 1). 


A CKKS ciphertext ct is an R-LWE sample masking a plaintext polynomial 
m: (co, c1) = (~as + m+ e,a) € RQ, and the decryption circuit is its evaluation 
at the secret s: ((co,¢1),(1,s)) = m + e € Ra. 
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A CKKS switching key swkop is a vector of R-LWE samples masking a 
secret s: (—a s! +w® Pste, a) € ioe forl <i< 6,w=(w,...,w®) 
an integer basis decomposition and P a secondary modulus such that P ~ 
>> w. Note that the security of the R-LWE samples used in the switching keys 
is based on the modulus QP. Through the public algorithm KeySwitch, sw ce 
can be used to homomorphically re-encrypt a ciphertext ct = (co, c1) to a cipher- 
text ct®’ = (ch, ch) by computing (ch, cl) = (co,0) + [P71 - (w7! (c1), swk?* )], 
where w~!(c1) denotes the decomposition of the coefficients of cı in base w. 
The additional modulus P is used to control the magnitude of the error (which 
is (w~+(ci),e)) added during the key-switching. The public encryption key is a 
switching key swkop- In addition to the access structure management that this 
procedure provides, it is a fundamental building block of the CKKS scheme as it 
is used to ensure the correctness and compactness of the decryption circuit for 
several core homomorphic operations (e.g. ciphertext-ciphertext multiplication 
and homomorphic plaintext-slots cyclic-rotations). 


2.3 Bootstrapping 


The bootstrapping procedure of the CKKS scheme [8] aims at raising the cipher- 
text to a higher modulus to enable further homomorphic evaluation. More specif- 
ically, upon the input of a ciphertext ct; such that (ct, (1,s)) = m(Y) +e, for s 
a secret with h non-zero coefficients, the CKKS bootstrapping outputs a cipher- 
text ct that decrypts to m'(Y) = m(Y) + e’, where Q > Q’ > q for Q the 
maximum modulus, Q’ the modulus after the bootstrapping, and q the modulus 
before the bootstrapping. It is important to note that ||e'|| > |le||. This implies 
that, although this procedure is referred to as bootstrapping, its aim is not to 
reduce the underlying error but to enable further computations. The procedure 
consists of the following four steps: ModRaise, CoeffsToSlots, EvalMod, and Slot- 
sToCoeffs. We now briefly explain them, omitting the error terms for clarity. 


ModRaise: the exhausted ciphertext, whose modulus is q, is expressed in the 
modulus Q > q. Note that this step does not modify the coefficients of the 
ciphertext (thus has no effect on the error), as it only represent them in a different 
RNS basis. This yields a ciphertext that decrypts to [co + scilg = q: I(X) + 
m(Y) = m' (Y), where q- I(X) = | — [sci]g + sci] is an integer polynomial 
that represents the extra multiples of g not removed by the reduction modulo Q 
(because Q > q). Note that ||I(X)|| < h (s has h non-zero coefficients). 

If 2n Æ N (sparse packing), then Y # X and I(X) is not a polynomial in Y. 
In other words, we have multiples of q in the coefficients X that are not multiples 
of N/2n. In this case, we can map q- I(X)+m(Y) to (N/2n)- (q: (Y) + m(Y)) 
by evaluating a trace-like map X > pes "—1(_1)iX#2n+1 [8] that zeroes those 
coefficients of X whose degree is not a multiple of N/2n, and multiplies the 
others by N/2n. 

The remaining steps of the bootstrapping remove this unwanted q - [(Y) 
polynomial by homomorphically evaluating a modular reduction by q on m’/(Y). 
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CoeffsToSlots: the canonical embedding r is homomorphically evaluated on 
m'(Y). Indeed, m’(Y) can be seen as a fresh message in the coefficient domain. 
To enable the parallel (slot-wise) evaluation, it needs to be encoded in the slot 
domain. 


EvalMod: a polynomial approximation of the function f(x) = x_ mod q is homo- 
morphically evaluated on m’(Y), thus removing the unwanted I(Y) polynomial. 


Slots ToCoeffs: the inverse of the canonical embedding, T~1, is evaluated on m’ (Y) 
and a close approximation (recall that ||e’|| > |e]|) of the original message m(Y), 
minus the unwanted polynomial, is retrieved. After this last step, the ciphertext 
has modulus Q’ > q and we can evaluate further operations, until it reaches 
modulus q and a new bootstrapping is needed. 


3 Proposed Technique 


Our contribution is based on two observations: (i) The complexity of the EvalMod 
step is determined by the secret distribution of the ciphertext during the Mod- 
Raise step (we further specify this dependency in Sect.3.1). (ii) The leveled 
behavior of the CKKS scheme positively affects its security. I.e., ciphertexts 
entering the ModRaise procedure are at a low level, and a sparser secret can be 
used for the same security. 

We use these observations to modify the ModRaise step of the bootstrapping 
by encapsulating it between two KeySwitching procedures: The first one switches 
the low-level ciphertext to a sparser secret s before the ModRaise and the sec- 
ond one, after the ModRaise, switches the high-level ciphertext back to a dense 
secret s. 

We detail now the original ModRaise procedure and the improvement we 
bring to it. 


3.1 Original ModRaise and Bootstrapping Failure Probability 


The original ModRaise (see Sect. 2.3) takes a ciphertext ct = (co,c1) € R2 that 
decrypts to m(Y), a polynomial of 2n coefficients, and outputs a new ciphertext 
ct’ = (ch, ch) € Ro that decrypts to a new message of the form q-I(Y)+m(Y)+e. 

The infinity norm of the polynomial [ (Y) is upper-bounded by the Hamming 
weight h of the secret, hence the EvalMod step has to evaluate a polynomial 
approximation of the modular reduction in the interval [—h, h]. However, this 
upper bound h can be quite large; since i (Y) follows an Irwin-Hall distribution 
[19], we have that ||I(Y)|| is O(Vh) with high probability [8] and, in practice, 
a smaller probabilistic bound K < h is used instead. Given that the ciphertext 
encrypts a message m(Y) with Y = XN/?” under a secret s — xn before the 
ModRaise, the exact probability f(K,h,n) = Pr[||I(Y)|| > K] can be computed 
by adapting the cumulative probability function of the Irwin-Hall distribution [3]: 


528 J.-P. Bossuat et al. 


2 | K+0.5(h+1)| TE | l Sn 
(h+1)! 2 =) ( ; Jue +0541) — 9" 34 
(1) 


We refer to f(K, h, n) as the bootstrapping failure probability, i.e. the probabil- 
ity that at least one coefficient of [ (Y) falls outside of the approximation interval 
[—K, K]. Indeed, when such event happens, the procedure returns unusable val- 
ues. For example, if we upper bound the failure probability to f(K,h,n) < 2715 
for a fixed n = 2" slots and variable h, then limp... K © 1.81Vh [3]. 

Therefore, the density h of the secret has a two-fold effect on the practicality 
of the bootstrapping. On the one hand, the sparser the secret, the smaller the 
range of parameters that can securely and efficiently evaluate the bootstrapping 
circuit, as a smaller h implies a smaller upper-bound on the modulus Q for a 
fixed ring degree N and a security parameter A. On the other hand, the denser 
the secret, the more levels are required for the EvalMod step. Indeed, this step 
homomorphically evaluates a modular reduction on the interval [—K, K] that, 
as shown, is proportional to vh. 


3.2 ModRaise with Sparse-Secret Encapsulation 


We instantiate the base scheme, as well as its bootstrapping circuit, with a secret 
s of density h such that the R-LWE samples of the keys, under s and at modulus 
QP, are at least A-secure. We then encapsulate the ModRaise step between two 
KeySwitch such that the ciphertext is only temporarily switched to a sparser 
secret § with density h < h, with R-LWE samples under § and at modulus 
qp < QP being at least A-secure. Consequently, the unwanted polynomial g-I(Y) 
depends on the distribution of 5, but the bootstrapping circuit remains evaluated 
under s. 

For this instantiation, we generate two sets of parameters {N, QP, h,o} and 
{N, gp, h, co}, with qp < QP and h < h, which are both at least A-secure; and we 
sample a secret s — Xn, as well as a secret 8 — y;,. Let swkĝ, ° be a switching 
key at modulus gp, which can be used to publicly re-encrypt a ciphertext from 
the secret s to the secret §. And let swkõ p” be a switching key at modulus QP, 
which can be used to publicly re-encrypt a ciphertext from 5 to s. 

Given a ciphertext ct at modulus q encrypted under s, our proposed algo- 
rithm first key-switches ct from s to § using swkĝz . Then, it applies the reg- 
ular ModRaise algorithm that expresses its coefficients in a larger modulus. The 
ciphertext is now expressed in the modulus Q, but with coefficients whose norm 
remains unchanged and bounded by |q/2]. Finally, the algorithm key-switches 
the ciphertext back to the key s by using swkĝ p“. We detail our modified Mod- 
Raise in Algorithm 1. 
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Algorithm 1: Encapsulated ModRaise 
Input: ctg, swké,"*, swkip* 
Output: cto g 
ct — KeySwitch(cty, swkgp `) 
ct  ModRaise(cts, Q) 

ct — KeySwitch(ctå, swkõp°) 
return ct 


A U Ne 


Remark 1. Algorithm 1 is implementation-agnostic, and therefore compatible 
with both the original [10] and the full-RNS variants of the CKKS scheme pro- 
posed by Cheon et al. [7]. If the implementation of the KeySwitch begins with 
a modulus basis extension (for example, from Q to QP), Algorithm 1 can be 
optimized by merging the ModRaise step in the second KeySwitch (now from q 
to QP), such that it essentially becomes two consecutive key-switches. 


3.3 Impact on the Evaluation-Key Generation 


Our modification to the bootstrapping slightly changes how evaluation keys are 
generated, as we now need to generate two sets of evaluation keys instead of one: 


1. A set parameterized by {N,QP,h,o} that uses a key s and comprises the 
encryption key, all the necessary evaluation keys for the linear transformations 
and homomorphic modular reduction, as well as the switching key swkop’. 

2. A set parameterized by {N, qp,h,o}, with qp < QP and h < h, and that 
uses a secret § and comprises the switching key swkĝp `. 


Although we increase the number of evaluation keys by two, this is only 
marginal with respect to the total number of switching keys needed for the 
bootstrapping; this is largely dominated by the number of rotations keys needed 
for the linear transformations, which is in the order of a hundred for n = 215 
slots. 

Our construction allows to us to use a dense secret for s and to instantiate all 
the evaluation keys at a larger modulus, which will inevitably increase their size. 
We however stress that the increase in size of the key set is a normal behavior of 
the scheme as it is directly related to the homomorphic capacity of a parameter 
set. 

In Sect. 4, we show that the modification to the ModRaise algorithm and 
the addition of the switching keys swkop and swk¿p ` does not introduce new 
security assumptions and that our construction is secure. 


4 Security Analysis 


In this section, we provide a security argument for Algorithm 1 (Sect. 3.2) that 
shows our modification neither changes nor introduces new security assumptions 
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to the CKKS scheme or its original bootstrapping. We then discuss the benefit 
on the security of using a small ephemeral secret during the bootstrapping and 
estimate security of such ephemeral secrets for concrete parameters. 

Note that, regardless of our proposition, users should always be aware of the 
security implications of using the CKKS scheme [22] as well as sparse secrets, 
and that they should carefully choose how it is parameterized. 


4.1 Security of the Modified ModRaise 


We consider an adversary A who has access to the public transcript of 

Algorithm 1: 

— ctj, an R-LWE sample (-—as+m+e,a) € R? with m a message, and security 
parameterized by the tuple {N, q, h,o}. 

- swkip , a switching-key composed of a set of R-LWE samples (—a® 3 + 
wOps + e, a) € Roe with ao — Ry, 5 — X; e® — Xo and 
w = (w!,...,w®) a decomposition basis. The security of this set of R-LWE 
samples is parameterized by the tuple {N, qp, h, o}. 

- swkgp’, a switching-key composed of a set of R-LWE samples (~as + 
w P3 + eM, al) € noe with a — RQP, 8 — Xh, e® + yo and 
w = (w!,..., wP) a decomposition basis. The security of this set of R-LWE 
samples is parameterized by the tuple {N, QP, h,o}. 

A wins if it can distinguish (ct?, swke7*, swkop") from the uniform distri- 
bution U (R2, teria Bor ) with an advantage greater than 27^. Therefore, to 
ensure that 


Adv 4 = |Pr[ ACEN WGE) = 1] — Pr[ AFERE RR) = 1] < 27A, 


it suffices to select the parameter sets {N, qp, h,o} and {N,QP,h,o} to be 
at least à-secure ({N,q, h,o} is naturally at least A-secure if {N, QP,h,o} is 
itself A-secure since q < QP). Regarding their joint distribution, the secu- 
rity argument holds under the assumption of circular security, which is already 
required to generate evaluation keys. For a parameterization example, we take 
two sets of parameters from the work of Cheon et al. [5]: {215,2881 214, 3.2} and 
{21°, 2431, 2°, 3.2}; both are A = 128-bit secure. Note that, in practice, the sec- 
ond set of parameters {N, gp, h, o} has a much smaller qp (e.g. 120 bits) than the 
431 bits used in this example, because it is instantiated at the smallest possible 
modulus. Hence, this parameter set is actually more secure than the one that 
uses the dense key (see Table 1). 
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Table 1. Parameters’ security for the low-level switching key. The modulus of the 
switching key is composed of q and an additional modulus p used during the key- 
switching. W denotes log(keyspace size), i.e., log (C) - “ia The asterisk * indicates 


that the estimator failed to provide a result and instead the security was extrapolated. 


log(N) log(qp) h || W | Primal [2]|Dual [2]| Dec [2] | Hybrid-Primal [5] | Hybrid-Dual [5] 
64||792| 368.0 340.4 | 376.0 260.4 317.8 
16 | 60+61 
32||427| 222.7 192.5 | 226.6 168.5* 283.8 
64||728| 309.2 415.0 | 315.6 217.7 227.8 
15 | 55+56 
32|/395| 187.9 191.5 | 319.0 140.9 162.2 


4.2 Minimizing the Use of Sparse Secrets and Achieving Higher 
Security 


Previous works on the CKKS bootstrapping assumed predefined single sparse- 
secret parameters and were focused on improving efficiency [4—6, 8, 13-16, 19-21]. 
The use of a sparse secret was deemed necessary to make the bootstrapping 
sufficiently practical. Although Bossuat et al. [3] showed that using a single 
dense secret can also be practical, this comes at the cost of reduced efficiency 
and precision (due to the need to evaluate a polynomial of several hundred 
coefficients). 

Our work changes this paradigm by, instead, proposing a higher-level change 
that directly removes this constraint. Although simple, our sparse-secret encap- 
sulation brings a significant improvement to the security and practicality of the 
CKKS bootstrapping. It enables the user to instantiate the bootstrapping eval- 
uation keys with a dense secret, which brings more freedom in making choices 
about the parameters and isolates the security assumption related to the sparse- 
secret to a single low-level (small modulus) key. Being at a low level, this key 
benefits from a large security margin against the most recent attacks [9,24] and 
will be, in practice, more secure than the evaluation keys generated with the 
dense secret. This result is a more practical and secure CKKS bootstrapping. 
Table 1 provides parameterization examples and their security for the low-level 
switching key that uses a sparse secret. 


5 Empirical Noise Analysis 


In this section we quantify the effect of our modification on the noise of the 
bootstrapping procedure. Our modification impacts the noise in two dimensions: 
it slightly modifies the circuit by adding additional key-switching operations and 
it allows the use of denser keys. 

The initial noise of the CKKS scheme is well understood and numerous works 
have carried out noise analyses [8,10,17,20]. These works, however, keep their 
analysis to single operations because when operations are composed the estima- 
tion of the noise (either as a bound or an average case) becomes less and less 
meaningful since the initial exact noise is unknown. 
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For this reason, noise analysis, especially for complicated circuits such as the 
bootstrapping, remains heuristic and with loose bounds. This is especially true 
when other factors besides the initial noise have to be taken into account, such as 
polynomial approximations or plaintext distribution. Therefore, noise analysis 
for circuits is experimentally conducted in practice. 

In this section, we empirically demonstrate with experiments the following 
propositions: 


Proposition 1. The modification to the ModRaise step only adds a small addi- 
tive noise which has a negligible impact on the bootstrapping precision. 


Proposition 2. The noise terms which are a function of the density h of the 
secret quickly dominate the additive noise of the bootstrapping circuit. 


5.1 Proposition 1 


Our modification to the ModRaise step adds two key-switching operations, one 
before it and one after it. The noise introduced by the key-switching is additive 
and can be minimized to a rounding error if correctly parameterized [13, 17,20]. 

The ModRaise step is followed by the CoeffsToSlots step, which homomor- 
phically evaluates the encoding algorithm. This step is carried out by evaluating 
a linear transformation [8] on the ciphertext vector and for n slots requires 
O(./n) plaintext multiplications and O(,/rlog,(n)) rotations [12] (which are 
key-switching operations), for a radix r < n. 

Hence, the two additional key-switching happening during the ModRaise step 
should only have a small negligible impact on the overall additive bootstrapping 
error. We verify Proposition 1 with the following empirical experiment: we com- 
pare the error of the reference circuit of Bossuat et al. [3] and the same circuit 
where only the ModRaise step differs. We use the exact same parameters as the 
one used by Bossuat et al. for all parameter sets, as well as the same secret-key 
density. For our modified circuit, the ModRaise step switches the secret to a 
different one of the same density h. 

Table 2 reports the results of the experiment. We observe that there is no 
significant difference between the noise of the original bootstrapping of Bossuat 
et al. and our modified circuit. The largest differences are coming from the sets [V 
and V but remain small (0.09 to 0.25 bits of difference). Both can be attributed 
to parameters that lead to a bootstrapping instance that is more sensitive to 
the distribution of the initial noise (larger secret density) and/or the additive 
noise (smaller plaintext scale). We will see in Sect.6.1 that the reduction in 
the EvalMod complexity from a smaller h allows to entirely mitigate this small 
loss of precision and even increase the bootstrapping precision. Note that this 
experiment is performed in a worst-case scenario, that is with fresh ciphertexts. 
In practice, a ciphertext input to the bootstrapping procedure accumulates the 
error of all the previous homomorphic operations, thus the noise from the two 
additional key-switching, which is additive, has far lower impact. 
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Table 2. Impact of the modified ModRaise on the bootstrapping precision, by com- 
parison between the results of Bossuat et al. [3] and the same bootstrapping circuit 
but with the modified ModRaise. Our work uses identical parameters to the ones of 
Bossuat et al. for all sets and our modified ModRaise switches the secret to a different 
secret of same density h. N is the ring degree, log QP the modulus of the switching 
keys, h the density of the secret, n the number of plaintext slots, K the probabilistic 
upper bound of ||/(¥Y)||, dsingz) the degree of the scaled cosine interpolant (Han and 
Ki’s method [13]), r the number of double angle evaluations, darcsin(z) the degree of 
the arcsine interpolant (Taylor series) and loge! the negative log of the error, which 
is interpreted as the plaintext precision. 


log(e~*) 
t l N) |l P h l K dsin x darcsin x 
Set [3] | log(V) | log(QP) og(n) (a) T © 1713) [Ours 
I 16 1546 192 19 25 63 |2 0 29-10 25-1 
14 26.00 | 26.07 
1 : : 
II 16 1547 192 2 25 63 |2 T ious ae 
14 31.60 | 31.68 
Il 16 1553 192 a 25 63 | 2 0 acl ca 
14 18.90 | 18.92 
15 16.80 | 16.65 
IV 16 1792 | 32768 325| 255 | 4 0 
14 17.30 | 17.21 
14 15.50 | 15.15 
V 15 768 192 25 63 |2 0 
13 15.40 | 15.29 


5.2 Proposition 2 


The error of individual homomorphic operations of the CKKS scheme has been 
studied and is well understood [8, 10, 17,20]. Notably, the error of a decrypted and 
decoded message in the CKKS scheme is a function of vh, h being the number 
of non zero elements of the secret key. Although the error related to the secret 
distribution can be controlled and minimized for most operations with a careful 
parameterization and scale management (such as addition, plaintext multiplica- 
tion or key-switching), ciphertext multiplication amplifies the error at a much 
greater rate because their error terms are compounded. This is specifically the 
case for polynomial evaluation, which involves ciphertext exponentiation when 
computing the power basis. 

We empirically verify this statement with the following experiment: we com- 
pare the precision of the bootstrapping circuit and its different individual parts 
for an increasing main secret density h. For the full bootstrapping circuit we use 
our modified ModRaise step with an ephemeral secret with h = 32 (A ~ 168 for 
N = 2" and log(qp) = 121, see Table 1 in Sect. 4). The results of this experiment 
are in Fig. 1. 


534 J.-P. Bossuat et al. 


— Full Circuit —— EvalMod —— CoeffsToSlots — SlotsToCoeffs 
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Fig. 1. Precision of the full bootstrapping circuit and its different individual steps for a 
secret s with variable h. The full circuit uses our modified ModRaise with an ephemeral 
secret 5 with h = 32. The parameters are log N = 2'° and logn = 2"°, an initial scale 
of 25? and 60-bit moduli (the maximum allowed size) are used for all operations. The 
EvalMod parameters are K = 10, dsin(z) = 30, r = 3 and daresin(z) = 7. log «| is the 
negative log of the error, which is interpreted as the precision. 


We observe that for operations involving a controlled noise augmentation 
(CoeffsToSlots and SlotsToCoeffs can be summarized as sums of plaintext mul- 
tiplications and key-switching operations) the initial encryption noise (which 
includes the encoding error of both the plaintext vector and plaintext matrices) 
is much larger than the noise added by the homomorphic operations and the 
decryption process. This results in a constant precision until the noise terms 
related to h become dominant, which happens at around h = 28 for Coeffs- 
ToSlots and h = 214 for SlotsToCoeffs. At this point, the line starts to follow the 
vh relationship (doubling h induces a loss of precision of 0.5 bits). 

As expected, the EvalMod step, which is an operation involving ciphertext 
exponentiation, has a noise growth that quickly overcomes the initial noise with a 
steady vh relationship that already starts at h = 2+. We observed that increasing 
the degree of the interpolant actually reduced the precision instead of increasing 
it. This means that the expected gain in precision from the additional higher 
degree terms of the interpolant was actually cancelled by the error resulting from 
higher exponentiation to compute the additional terms of the power basis. 
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The precision of the full circuit follows the one of the EvalMod step with a 
stable offset of about 4 bits, confirming that the EvalMod step is the bottleneck 
of the bootstrapping circuit precision. This is not surprising, since the EvalMod 
step is the only non-linear part of the bootstrapping circuit. This offset is the 
result of the composition of the different part of the bootstrapping circuit and 
compounding of their errors. 

The loss of precision caused by a higher secret density can be compensated 
by increasing the initial scale A if needed. However, when using the full-RNS 
variant of the CKKS scheme [7] this scale cannot be arbitrarily increased, since 
the size of the used primes is limited by machine words, which are usually of 64 
bits. In practice, the maximum size of the primes is even smaller, typically of 
61 bits, to enable more efficient implementations. A solution is to use multiple 
words per prime or multiple primes per level, but both will induce an overhead. 


5.3 Conclusion 


In this section we empirically showed that: (i) our modification to the ModRaise 
step has a negligible impact on the bootstrapping precision and that (ii) the 
noise term related to the density h of the secret quickly dominates all other 
terms when ciphertext multiplication is involved. Our experiments allow us to 
conclude that our construction, by itself, only has a negligible impact on the 
bootstrapping precision, and that an increased noise when using a main secret 
with a higher density h comes from the inherent noise of the scheme and not 
from our modification of the ModRaise step. These results are further confirmed 
with the experimental results shown in Sect. 6. 


6 Evaluation 


In this section, we evaluate the performance of our proposed modification against 
the recent work of Bossuat et al. [3], which is currently the state of art in term 
of bootstrapping throughput (number of plaintext bits bootstrapped per second). 
We will show that, when using the same parameters, our construction enables a 
more efficient and precise bootstrapping with a much lower failure probability. 

We implemented our work in the Lattigo library [18]. All benchmarks were 
conducted on hardware with the same specifications as the one used by Bossuat 
et al. (Windows 10, i5-6600K CPU @ 3.50 GHz, 32 GB of RAM, single threaded), 
ensuring fair comparisons. All parameter sets, for all experiments, have a security 
A & 128. 
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Table 3. Bootstrapping precision and failure probability when reducing the complexity 
of the EvalMod step. The original results of Bossuat et al. [3] are given for reference and 
we use the same cryptographic parameter sets for all experiments (which are identical 
to the ones in Table 2). n is the number of plaintext slots, h the density of the main 
secret, h the density of the ephemeral secret, K the range for the approximation of 
the scaled sine function, f(K,h,n) the failure probability function, and loge! the 
negative log of the error, which is interpreted as the plaintext precision. Details about 
the interpolant used for each set can be found in Table 4. The security of the ephemeral 
secret is, for h = 32, A ~ 168 for the Sets I to IV, and \ = 141 for Set V (see Table 1 
in Sect. 4). 


i h Bossuat et al. [3] This work This work 
ogn = = = = 
Set [3] |198 K | f(K,h,n) loge ||| h | K | f(K,h,n)|loge-|| h | K | f(K,h,n) loge! 
15 -15.58 | 25.70 -34.11 | 27.32 -138.70 | 26.63 
I 192 || 25 32|12 32| 16 
14 -16.58 | 26.00 -35.11 | 27.39 -139.70 | 26.89 
15 -15.58 | 31.50 -34.11 | 32.36 -138.70 | 32.11 
II 192 || 25 32|12 32| 16 
14 -16.58 | 31.60 -35.11 | 32.17 -139.70 | 32.04 
15 -15.58 | 19.10 -34.11 | 19.14 -138.70 | 19.13 
UI 192 || 25 32|12 32| 16 
14 -16.58 | 18.95 -35.11 | 18.92 -139.70 | 18.90 
15 -14.90 | 16.80 -34.11 | 23.80 -138.70 | 23.12 
IV 32768 || 325 32/12 32| 16 
14 -15.90 | 17.30 -35.11 | 24.29 -139.70 | 23.62 
14 -16.58 | 15.50 -34.11 | 15.48 -139.70 | 15.45 
v 192 || 25 32|12 32|16 
13 -17.58 | 15.40 -35.11 | 15.66 -140.70 | 15.55 


6.1 Better Precision, Reduced Failure Probability and Smaller 
Interpolant 


The EvalMod step of the bootstrapping procedure evaluates a polynomial approx- 
imation of the modular reduction. The bootstrapping failure probability is given 
by the function f(K, h,n) (see Eq. 1 in Sect. 3.1), with [—K, K] the range of the 
approximation, h the density of the secret at the moment of the ModUp step 
and n the number of plaintext slots. If a coefficient falls outside of the range 
[-K, K], the polynomial approximation fails and the bootstrapping procedure 
returns unusable values. Hence, the precision of the polynomial approximation 
evaluated during the EvalMod is a trade off between the degree d of the approx- 
imation (which has an impact on the number of levels consumed during this 
step), and the range K of the approximation (which, for a given d, determines 
both the precision and failure probability of the EvalMod step). So, for a fixed d, 
hand n, the greater K is, the smaller the failure probability, but the smaller the 
precision. Therefore, if h can be reduced, K can be reduced and the precision 
increased (up to the inherent precision of the interpolant). 

Table3 compares the precision and failure probability of the bootstrapping 
from the original work of Bossuat et al. [3] with ours. Additional information 
about the used interpolant can be found in Table 4. 

We observe that, for all sets, the bootstrapping precision is either similar 
or improved, showing that we are able to achieve a failure probability that is 
many orders of magnitude smaller without compromising security or precision. 
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The largest improvement is for Set IV, which is not surprising since Bossuat 
et al. had to use a very large interpolant for their EvalMod step. 


Table 4. Interpolants used for the EvalMod step of the experiments of Table3. K is 
the range of the interpolation, dsin(z) the degree of the scaled cosine interpolant (Han 
and Ki’s method [13]), r the number of double angle evaluations, and daresin(z) the 
degree of the arcsine interpolant (Taylor series). 


Set [3] Bossuat et al. [3] This work This work 
K  dsin(x) | T | darcsin(z) || K | dsin(w) | 7 | darcsin(a) || K | dsin(æ) | r | darcsin(x) 
I 25 63 | 2 0 12| 22 |3 0 16| 30 |3 0 
II 25 63 |2 it 12| 24 |3 T 16; 30 |3 7 
II 25 63 | 2 0 12) 22 |3 0 16| 30 |3 0 
IV 325| 255 |4 0 12) 22 |3 0 16| 44 |2 0 
V 25 63 |2 0 12| 22 |3 0 16| 30 |3 0 


The configuration of Bossuat et al. led to a failure probability of 2731-6 per 
plaintext slot (27156 for n = 2)° slots). Our failure probability per slot is now 
2750-1 (2784.11 for n = 215 slots) for K = 12 and 271547 (27138-7 for n = 215 
slots) for K = 16, which is smaller than the security parameter. Note that 
when using Han and Ki’s interpolation method, the interpolant has a minimum 
degree of dsin(x) = 2(K — 1), hence K = 16 is the maximum value to get a depth 
log(d + 1) < 5 interpolant. 

For all parameter sets except for Set IV, Bossuat et al. used K = 25, dsin(z) = 
63 (the degree of the scaled sine interpolant) and r = 2 (the number of double 
angle evaluation), for a total depth of 6 + 2 = 8. By reducing K to 12 and 16, 
we were originally able to reduce the interpolant degree to dsin(z) œ 40 for an 
equivalent bootstrapping precision. In this configuration, it turns out that dsin(z) 
is now small enough to be able to increase r to 3 and further reduce dsin(x) to 
a value equal to or smaller than 31. This allows us to keep the same depth and 
precision, but with a more efficient polynomial evaluation, since each double 
angle evaluation only needs one multiplication. 

In conclusion, Table3 shows that, when using the parameters of Bossuat et 
al. in conjunction with our modified ModRaise step and a small ephemeral secret, 
the bootstrapping precision and failure probability can be noticeably improved. 


6.2 Higher Bootstrapping Throughput 


The bootstrapping utility is a metric that enables the evaluation of the perfor- 
mance of a bootstrapping circuit. It is a concept that was first introduced by 
Chen et al. [4] as n x Levels/Time, for n the number of plaintext slots, Levels the 
number of levels available after the bootstrapping, and Time the bootstrapping 
complexity represented in CPU time (single threaded). It was then expanded 
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to the bootstrapping throughput by Bossuat et al. [3], which measures the num- 
ber of plaintext bits bootstrapped per second as n x log e~* x log Q’/Time, for 
log e~! the bootstrapping precision and log Q’ the number of modulus bits avail- 
able after the bootstrapping. Bossuat et al. use log Q’ instead of the number of 
remaining levels because this value is more representative of the actual remaining 
homomorphic capacity. Indeed, optimizing a homomorphic circuit often leads to 
a dynamic scale, in which case the notion of level does not make sense anymore. 

Table5 reports the bootstrapping throughput of the experiments of Table 3 
(along with Table 4). This comparison between the results of Bossuat et al. and 
ours shows that our modification allows for better timings, even though two 
additional key-switching operations are added to the ModUp step. 


Table 5. Comparison of the bootstrapping throughput [3] with log(bits/s) = log(n x 
log Q’ xlog e'/Time), where n is the number of plaintext slots, Q’ the residual modulus 
after the bootstrapping, log e7! the bootstrapping precision, and Time the CPU cost 
in seconds. 


Bossuat et al. [3] This work Ratio 

Set [3] |log(n) log(e—*) | log(Q’) | Time | log(bits/s) || log(e—+) | log(Q’) | Time | log(bits/s) | bits/s 
I 5 25.7 420 23.0 23.87 26.63 420 19.9 24.13 1.19x 
4 26.0 420 16.9 23.33 26.89 420 14.9 23.56 1.17x 

1 5 31.5 285 23.4 23.59 32.11 285 20.2 23.82 1.17x 
4 31.6 285 16.0 23.13 32.04 285 14.5 23.30 1.12x 

mW 5 19.1 505 18.1 24.06 19.13 505 15.9 24.24 1.13x 
4 18.9 505 13.1 23.50 18.90 505 11.9 23.64 1.10x 

Iv 5 16.8 410 39.2 22.70 23.12 420 19.9 23.93 2.34X 
4 17.3 410 24.9 22.15 23.62 420 14.9 23.37 2.33X 

v 4 15.5 110 7.5 21.82 15.45 110 5.9 22.17 1.27x 
3 15.4 110 6.0 21.14 15.55 110 4.5 21.57 1.34x 


We observe that all parameter sets of our work have a larger bootstrapping 
throughput, with Set IV achieving a throughput that is 2.34x the one of Bossuat 
et al. This shows that our proposed change to the ModUp steps enables a better 
bootstrapping throughput in addition to a negligible failure probability (note that 
this failure probability is not taken into account in the bootstrapping throughput). 


6.3 Dense Key Bootstrapping 


In the previous sections, we evaluated the performance of our modified boot- 
strapping against the results of Bossuat et al., for which the parameters use a 
sparse secret as the main secret. In this section, we evaluate the performance of 
our modified bootstrapping with parameters that use a dense secret as the main 
secret. 

By increasing the density of the main secret from 192 to N/2, we are able to 
increase log QP to ~ 1790 for N = 218 and ~ 881 for N = 2", thus increasing 
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the remaining homomorphic capacity (log Q’) after the bootstrapping, and still 
retaining a security of A 128 bits. 


Table 6. Bootstrapping throughput [3] of various parameter sets with a log QP based 
on a dense secret as the main secret, with log(bits/s) = log(n x log Q’ x log e~* /Time), 
where n is the number of plaintext slots, Q’ the residual modulus after the bootstrap- 
ping, loge! the bootstrapping precision, and Time the CPU cost in seconds. 


[log(N)| log(QP) | (hh) | log(n) |log(e~!) log(Q’) | Time | log(bits/s) | 
16 |1401 + 366 | ( 15 23.0 580 25.1 24.05 
16 | 1483 + 305 | (N/2, 32 15 29.8 465 26.3 24.04 
( 
( 


16 1488 + 305 15 17.8 745 21.5 24.26 
15 768 + 112 14 17.3 166 7.9 22.50 


Table6 reports the result of this experiment and shows that despite the 
expected and unavoidable loss of precision of 0.5 - log((N/2)/192) ~ 3.7 for 
N = 2! and ~ 3.2 for N = 2!° (see Sect. 5), we are still able to obtain a sim- 
ilar if not greater bootstrapping throughput than when using a sparse secret as 
the main secret. Timings are slightly larger than the ones reported in Table5 
because all operations are happening at a higher modulus, thus are more costly. 
The parameter set for N = 2!° shows a significantly larger bootstrapping through- 
put compared to Set V of both Bossuat et al. and our work (1.6x and 1.25x 
respectively). The reason is that this parameter set could only accommodate for 
a small homomorphic capacity when using a sparse secret, and the bootstrap- 
ping precision had to be deliberately tuned down to end up with a meaningful 
remaining homomorphic capacity after the bootstrapping. Being able to increase 
the homomorphic capacity also allowed us to allocate larger moduli to the boot- 
strapping circuit, thus increasing its precision. 

Although the bootstrapping throughput reported in Table6 is only slightly 
larger than the one shown in Table 5 (with the exception of the parameter set 
using N = 2!°), the parameters used in Table6 would likely not have to be 
updated even if attacks on sparse secrets were improved, because these would 
not apply to the main secret (which is dense), and the low-level sparse secret 
benefits from a large security margin. 


7 Conclusion 


In this work, we have presented a sparse-secret encapsulation technique for the 
bootstrapping of the CKKS scheme. We have shown that by temporarily switch- 
ing during the ModRaise step the low-level ciphertext to a sparser secret, we can 
optimize the efficiency-security trade-off of the bootstrapping circuit, by break- 
ing the dependency between the sparse-secret security and the largest modulus. 
This enables all high-level evaluation keys to use a denser secret, thus to provide 
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a greater initial homomorphic capacity and more resilience to attacks targeting 
sparse secrets, while still enjoying a lower-complexity bootstrapping. Moreover, 
our technique also enables the parameterization of the EvalMod step in an inter- 
val that is large enough to make its failure probability arbitrarily small, which, 
to the best of our knowledge, has never been achieved before. 

When using the parameters of previous works, our experiments show that 
the proposed modification allows for a 128-bit secure bootstrapping with negli- 
gible failure probability, that also benefits from a greater remaining homomor- 
phic capacity, greater precision, and smaller complexity. Moreover, when using 
a dense secret, our bootstrapping circuit has greater bootstrapping throughput 
than previous state-of-the-art approaches that use a sparse secret, especially for 
small parameters. 

We believe these improvements are a major step forward for the security, sta- 
bility, efficiency and reliability of the bootstrapping of the CKKS scheme, which 
is a necessary building block to enable high-depth arithmetic circuit evaluation 
under encryption. 
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Abstract. Predictable arguments introduced by Faonio, Nielsen and 
Venturi [14] are private-coin argument systems where the answer of the 
prover can be predicted in advance by the verifier. In this work, we 
study predictable arguments with additional privacy properties. While 
the authors in [14] showed compilers for transforming PAs into PAs 
with zero-knowledge property, they left the construction of witness indis- 
tinguishable predictable arguments (WI-PA) in the plain model as an 
open problem. In this work, we first propose more efficient constructions 
of zero-knowledge predictable arguments (ZK-PA) based on trapdoor 
smooth projective hash functions (TSPHFs). Next, we consider the prob- 
lem of WI-PA construction in the plain model and show how to transform 
PA into WI-PA using non-interactive witness-indistinguishable proofs. 

As a relaxation of predictable arguments, we additionally put forth a 
new notion of predictability called Commit-and-Prove Predictable Argu- 
ment (CPPA), where except the first (reusable) message of the prover, 
all the prover’s responses can be predicted. We construct an efficient 
zero-knowledge CPPA in the non-programmable random oracle model 
for the class of all polynomial-size circuits. Finally, following the connec- 
tion between predictable arguments and witness encryption, we show an 
application of CPPAs with privacy properties to the design of witness 
encryption schemes, where in addition to standard properties, we also 
require some level of privacy for the decryptors who own a valid witness 
for the statement used during the encryption process. 


Keywords: Predictable arguments - Zero-knowledge - Witness 
indistinguishability - Witness encryption 


1 Introduction 


Interactive proofs (IPs) and arguments introduced by Goldwasser, Micali, and 
Rackoff [19] are cryptographic protocols that allow a prover to convince a verifier 
about the veracity of a public statement x € £, where £ is an NP language. The 
interaction may consist of several rounds of communication, at the end of which 
the verifier decides to accept or reject the prover’s claim on the membership of 
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x in £. There are two properties required for an IP, namely completeness and 
soundness. Completeness means that if x € £, the honest prover can always con- 
vince the honest verifier. Soundness means that for x ¢ £ no (even unbounded) 
malicious prover can convince the honest verifier that x € £. Argument sys- 
tems are like IPs, except they are only computationally sound; i.e., it should be 
computationally hard (and not impossible) for a malicious prover to convince 
the verifier that x € £. An interactive proof is called public-coin if the verifier 
messages are uniformly and independently random, and private-coin otherwise. 

Recently, Faonio, Nielsen and Venturi [14] introduced a new property for 
argument systems called predictability. Predictable arguments (PA) are private- 
coin argument systems where the answer of the prover can be predicted effi- 
ciently, given the honest verifier’s (private) random coins. The prover in such 
arguments is deterministic and must be consistent with the unique accepting 
transcript throughout the entire protocol. Faonio et al. [14] formalized this notion 
and provided several constructions based on various cryptographic assumptions. 
They also considered PAs with additional privacy properties, namely a zero- 
knowledge (ZK) property, and showed two transformations from PAs into ZK- 
PAs, the first in the common reference string (CRS) model, and the second in 
the non-programmable random oracle (NPRO) model. 


1.1 Our Contribution 


In this paper, we study predictable arguments with privacy properties in more 
detail. Our results are three-fold: 

First, we provide a more efficient construction of ZK-PA in the CRS model. 
Compared to the generic transformation of [14], the resulting argument is much 
more efficient although it works only for a restricted class of languages; i.e., all 
languages that admit SPHFs. This includes all algebraic languages described 
in Sect. 2.2. 

Second, we answer an open problem raised in [14] and show how to con- 
struct witness indistinguishable PAs (WI-PA) in the plain model by using non- 
interactive witness indistinguishable (NIWI) proofs in the plain model. Infor- 
mally, in order to ensure that the verifier’s challenge in the first round is well- 
formed, we force the verifier to provide a NIWI proof for the statement that “the 
produced challenge is well-formed”. Witness-indistinguishability follows from the 
soundness of the underlying NIWI and the predictability of the argument. More- 
over, we provide a reduction that shows how an adversary breaking the sound- 
ness of the WI-PA can be exploited in order to violate the WI property of the 
underlying NIWI proof system. 

Third, motivated by the fact that predictable argument (even without pri- 
vacy properties) is a strong notion’, we put forward a relaxation of predictable 


1 This follows by the fact that predictable arguments and witness encryption (that only 
exists based on strong primitives like indistinguishability obfuscation) are equivalent. 
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arguments, namely, commit-and-prove? predictable arguments (CPPA) that, 
except the first message of the prover, all the prover’s responses can be pre- 
dicted. We formalize this notion for the language of dynamic statements of form 
x = (cm,C,y), where cm is the prover’s first message, and C is an arbitrary 
polynomial-size circuit possibly specified by the verifier. In particular, we con- 
sider a case where the prover publishes a first message cm, after which the prover 
can run an unbounded number of predictable arguments for different but corre- 
lated statements (cm, C;, y;). In contrast to PAs for which efficient construction 
based on standard assumptions (even without ZK) seems out of reach, we give 
a construction of ZK-CPPA for any polynomial-size circuit C € P in the NPRO 
model using garbled circuits (GC) and oblivious transfer (OT). Our construction 
is very similar to the three-round zero-knowledge argument of [15] with the main 
difference being the reusability of the prover’s first message and providing ZK 
in the non-UC model under milder assumptions. 


Applications. To demonstrate the usefulness of (CP)PA with privacy proper- 
ties, we will give its application in the context of witness encryption. We con- 
sider witness encryption schemes with a strong notion of privacy for the decryp- 
tor, wherein a malicious encryptor should not learn any information about the 
decryptor’s witness, even after the decryptor reveals the decrypted message. Our 
motivating applications for this scenario are dark pools and over-the-counter 
(OTC) markets in which an investor (the encrypting party) is interested to com- 
municate with only those trading parties (potential decryptors) whose financial 
conditions satisfy some constraint. To realize this application, a recent work by 
Ngo et al. [23] introduced the notion of witness key agreement (WKA) which 
allows the two sides to agree on a secret key k, given that the trading parties 
hold a witness that satisfies the desired relation. We show in Sect.6.1 that the 
witness encryption (WE) interpretation of our ZK-CPPA construction can be 
used to realize this application with an efficiency improvement in some aspects. 


1.2 Related Work 


This paper is a follow-up to the work of Faonio et al. [14] that introduced 
the notion of predictable arguments of knowledge (PAoK) systems. While PAs 
are always honest-verifier zero-knowledge, providing zero-knowledge or even the 
weaker notion of witness-indistinguishability is quite challenging. In [14], the 
authors show a compiler for constructing ZK-PA in the CRS model and leave the 
construction of WI-PA in the plain model as an open problem. We answer the 
open problem and propose more efficient ZK-PAs in the CRS model. A related 
work is that of Bitansky and Choudhuri [7] who recently constructed deterministic- 
prover ZK arguments for NP and showed that such arguments imply ZK-PA for 
NP. Different from [7] who mainly focus on feasibility results and require strong 


? We call our notion commit-and-prove PA because, roughly speaking, a prover first 
commits to an input (once and for all) and later proves that an opening for the 
commitment satisfies some properties of interest. Our name is also inspired by the 
phrase “commit-and-prove schemes” used in some papers, e.g., [9]. 
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assumptions (e.g., indistinguishability obfuscation) in their construction, our work 
considers practical solutions in the CRS model. In another related work, Dahari 
and Lindell [13] studied deterministic-prover honest verifier ZK arguments in the 
plain model. In the same work, they also constructed full ZK arguments given that 
the prover has access to a pair of witnesses one of which can be used as a basis for 
the prover’s randomness. This differs from our ZK-PA construction wherein the 
prover is “truly deterministic” although at the cost of requiring a trusted setup. 
The recent work of [10] introduced the notion of Witness Maps. A Unique Wit- 
ness Map (UWM) is a cryptographic notion that maps all the witnesses for an NP 
statement to a single witness in a deterministic way. While UWMs can be seen as 
deterministic-prover NIWI arguments, they differ from WI-PA in several respects, 
making the two concepts incomparable. First, WI-PA does not require a trusted 
setup in the form of a common reference string, whereas UWMs are in the CRS 
model. Second, we consider WI-PA as an interactive protocol, whereas UWMs 
are non-interactive. Lastly, although UWMs are deterministic-prover, they are not 
necessarily predictable. 


2 Preliminaries 


Let PPT denote probabilistic polynomial-time. All adversaries throughout this 
work will be stateful. By y — A(x;r) we denote that A, given input x and 
randomness r, outputs y. Let A € N be the security parameter and negl(A) be 
an arbitrary negligible function. We write a ~) b if |a — b| < negl(A). 


2.1 Pairings 


A pairing is defined by a tuple bp = (p, G1, G2, Gr, ê, 91, g2) where G1, G2, Gr 
are (additive) groups of prime order p, gı is a generator of G1, g2 is a generator 
of Gg, and ê : Gi x G2 —> Gr is an efficient, non-degenerate bilinear map. In 
particular, é(a-g1, b-g2) = (ab) -ê(g1, g2) for any a,b € Zp. We denote [a]; := a: gi 
for t € {1,2,T} where we define gr = €(g1, g2). The same notation naturally 
extends to matrices [M]; for M € Zp”. 


2.2 Algebraic Languages 


We refer to algebraic languages as the set of languages associated to a rela- 
tion that can be described by algebraic equations over abelian groups. To be 
more precise, let gpar be some global parameters, generated by a probabilis- 
tic polynomial-time algorithm setup.gpar which takes the security parameter A 
as input. These global parameters can correspond to the description of groups 
involved in the construction and usually includes the description of a bilin- 
ear group. Throughout the paper, we suppose that these global parameters are 
implicitly given as input to each algorithm. 

Let Ipar = (M, 0) be a set of language parameters generated by a polynomial- 
time algorithm setup.|par which takes gpar as input. Here, M : Gf — G”** and 
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0 : Gf + G” are linear maps such that their different coefficients are not neces- 
sarily in the same algebraic structures. Namely, in the most common case, given 
a bilinear group gpar = (p, G1, Go, Gr, ê, [L], [2]2), they can belong to either 
Zp, Gi, G2, or Gr as long as the equation 6(x) = M(x) - w is “well-consistent” . 

Formally, for a set Apar that defines the underlying domain, we define an 
algebraic language Lipar C Apar as 


Cie {x € G]3w € ZÈ : O(x) = M(x) w} . (1) 


An algebraic language where M is independent of x and @ is the identity function 
is called a linear language. 

Finally, we note that algebraic languages are as expressive as generic NP 
languages. This is because every binary circuit can be represented by a set of 
linear equations. 


2.3 Smooth Projective Hash Function 


Let Lipar be a NP language, parametrized by a language parameter Ipar, and 
Ripar E Apar be its corresponding relation. A Smooth projective hash functions 
(SPHFs [12]) for Lipar is a cryptographic primitive with this property that given 
Ipar and a statement x, one can compute a hash of x in two different ways: either 
by using a projection key hp and (x,w) € Ripar as pH — projhash(Ipar; hp, x, w), 
or by using a hashing key hk and x € Apar as H — hash(Ipar; hk, x). The formal 
definition of SPHF follows. 


Definition 1. A SPHF for {Lipar} is a tuple of PPT algorithms (setup, hashkg, 
projkg, hash, projhash), which are defined as follows: 


setup(1*): Takes in a security parameter A and generates the global parameters 
pp together with the language parameters Ipar. We assume that all algorithms 
have access to pp. 

hashkg(Ipar): Takes in a language parameter Ipar and outputs a hashing key hk. 

projkg(Ipar; hk, x): Takes in a hashing key hk, Ipar, and a statement x and outputs 
a projection key hp, possibly depending on x. 

hash(Ipar; hk, x): Takes in a hashing key hk, Ipar, and a statement x and outputs 
a hash value H. 

projhash(Ipar; hp, x,w): Takes in a projection key hp, Ipar, a statement x, and a 
witness w for x E€ L and outputs a hash value pH. 


A SPHF needs to satisfy the following properties: 


Correctness. It is required that hash(Ipar; hk, x) = projhash(Ipar; hp, x, w) for all 
x € £ and their corresponding witnesses w. 


Smoothness. It is required that for any Ipar and any x ¢ £, the following distri- 
butions are statistically indistinguishable: 


{ (hp, H) : hk — hashkg(Ipar), hp — projkg(Ipar; hk, x), H — hash(Ipar; hk, x)} 
{ (hp, H) : hk — hashkg(Ipar), hp — projkg(Ipar; hk, x), H —s a} . 


where 2 is the set of hash values. 
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2.4 Predictable Arguments 


Predictable arguments are multi-round interactive protocols where the verifier 
generates a challenge (which will be sent to the prover) and at the same time 
it can predict the prover’s response to that challenge. Here we recall the formal 
definition of predictable arguments (PA) [14]®. 

Let RG be a relation generator that takes in a security parameter 1+ and 
returns a polynomial-time decidable binary relation Ripa. For a pair (x,w) € 
Ripar, we call x the statement and w the witness. The set of all possible relations 
Ripar that the relation generator RG (for a given 1*) may output is denoted by 
RG. To make the notation simple, we assume that Ripar can be described with 
a language parameter Ipar by which À can be deduced as well. 


Definition 2 (Predictable Argument (PA)). A predictable argument for a 
relation Ripar (with the corresponding language parameter Ipar) is an interac- 
tive protocol between a prover P and a verifier V, which can be specified by two 
algorithms Ipa = (Chall, Resp) defined as follows: 


(Executed by V): (c,b) <— Chall(lpar,x). The algorithm takes in Ipar and a 
statement x, and returns a challenge c along with a predicted answer b. 
(Executed by P): a <— Resp(Ipar,x,w,c). The algorithm takes in Ipar, a pair of 
statement-witness (x,w) and a challenge c, and returns an answer a. 
(Executed by V): If a= b, V returns acc; otherwise it returns rej. 


We denote by (P(Ipar, x,w), V(Ipar,x)) an execution between P and V with 
common inputs (Ipar,x) and prover’s secret input w. The success of the prover 
in convincing the verifier is denoted by (P(lpar, x, w), V(Ipar, x)) = acc. Also, we 
may call (c, b) as both the output of Chall(), or the output of V running Chall(). 
The same convention holds for a. 

We require two properties for a PA: completeness and soundness. 


— (Perfect) Completeness. A predictable argument has perfect completeness 
if for all A € N, for all Ripa € RG», and for all (x,w) € Ripar 


Pr [a =b : (c,b) — Chall(lpar, x); a — Resp(lpar, x, w, c)] =1 
~— e-Soundness. For all A € N, all x É Lipar, and all PPT adversaries A 
Pr[a@=b: Ripar —sRGa; (c, b) — Chall(Ipar, x); a — A(Ipar, x,c)] ~) € 


We call a PA sound if € € negl(A). A PA is secure if it is complete and sound. 
Furthermore, we say that a PA is zero-knowledge (ZK-PA) if there exists a PPT 
algorithm Sim that computes the predicted answer of any valid statement x 
without knowing the random coins used in Chall() nor any witness for x, but only 
knowing the challenge c. In the case of ZK in the CRS model, the algorithm takes 
in also a CRS trapdoor 7 which is generated by a setup algorithm (crs,,7) — 
setup(1). For notational simplicity, we assume that in this case Ipar contains 
crs; as well. 


3 We define PAs as one-round protocols. As shown in [14], this is without loss of 
generality as every p-round PA can be squeezed into a one-round PA. 


548 H. Khoshakhlagh 


Zero-Knowledge 


(crs;, T) — setup(1*); (x, w, c) — A(crs;, T); 

if Ripar(x,w) = 0, then return 0; 

b —$ {0, 1}; if b = 0 then a — Resp(Ipar, x, w, c); else a — Sim(Ipar, x, T, €); 
bv — Ala); 


return b = b'; 


Fig. 1. Experiment for the definition of Zero-knowledge 


— Zero-Knowledge. A predictable argument I is zero-knowledge if there 
exists a PPT simulator Sim such that for all PPT adversary A, 
Pr[Expiy sim(A, A) = 1] ~a 4, where Exp% sim(A, A) is depicted in Fig. 1. 

In this work, we also consider a weaker version of zero-knowledge, called 
witness indistinguishability (WI) which informally states that the adversarial 
verifier cannot identify which witnesses are held by the prover. 


— Witness-Indistinguishability. A predictable argument JT is statistically 
witness indistinguishable if for any adversary A, for any common statement x, 
for any witnesses w1, w2 such that (x, w1) € Ripar, (x, W2) E€ Ripar, the following 
holds: 

(P(Ipar, x, wi), A(Ipar, x)) ~) (P(lpar, x, w2), A(Ipar, x)) 


2.5 Oblivious Transfer 


A 2-round oblivious transfer (OT) is a protocol between a receiver and a sender 
and consists of three polynomial-time algorithms Mor = (Më, He, H 
defined as follows: 


First round. The receiver generates the first message m? — Hg (b; r?) for the 
selection bit b € {0,1} and random tape r? € {0,1}P°¥), 

Second round. For the input messages (7°, xt), where x! € {0,1}P°Y) for l € 
{0, 1}, the sender generates the second message mS — II3;(m*®, (x°, xt); r5) 
using random tape r € {0, 1}Po¥), 

Output. The receiver computes the output x = HE (mS, b rE). 


In this work, we are interested in OT protocols that are correct and securely 
implement the standard ideal OT functionality For in the presence of mali- 
cious adversaries. Moreover, we require an additional property called sender- 
extractability [15], which at a high-level means that the randomness of the sender 
is sufficient to reconstruct its input. The formal definition of this property and 
For can be found in the full version [22]. 
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2.6 Garbled Circuits 


We recall the definition of garbling schemes formalized in [3]. At a high-level, a 
garbling scheme consists of four algorithms GC = (Garble, Encode, Eval, Decode) 
defined as follows: Garble takes a circuit C and outputs a garbled circuit C, 
encoding information e, and decoding information d. Encode takes an input e 
and x, and outputs a garbled input X. Eval takes as input a garbled circuit C, 
and a garbled input X and outputs a garbled output Y. Finally, Decode takes 
d and a garbled output Y, and outputs a plain output y. In this work, we also 
assume an extra verification algorithm Verify that takes (C,C,e) as input and 
outputs 1 if this triple is valid. 

A garbling scheme GC should satisfy correctness and the following security 
properties: authenticity which informally captures the unforgeability of the out- 
put of a garbled circuit evaluations, and verifiability that ensures the existence 
of an algorithm Verify that takes a circuit C, a (possibly maliciously generated) 
garbled circuit C, and encoding information e, and outputs 1 if C is a valid 
garbling of C. The formal definition of these properties can be found in the full 
version [22]. 


3 More Efficient ZK-PA 


PAs have deterministic provers and hence by an impossibility result from Gol- 
dreich and Oren [18] cannot be zero-knowledge in the plain model for non-trivial 
languages. Faonio et al. [14] circumvented this impossibility and provided two 
constructions by using setup assumptions. Their first construction in the CRS 
model is based on the natural idea of adding a NIZK proof of knowledge m for 
the “well-formedness” of the challenge generated by the challenger. Although 
this gives a generic compiler for constructing ZK-PAs from PAs, here we inves- 
tigate designing out-of-the-box ZK-PA protocols with concrete efficiency. We 
give a construction in the CRS model which is based on the notion of Trapdoor 
Smooth Projective Hash Functions (TSPHFs). 


3.1 TSPHF-Based ZK-PAs in the CRS Model 


As shown in [14], PAs can be constructed from SPHFs, but since the projection 
key in SPHFs can be generated in a malicious way, they can provide only honest- 
verifier zero-knowledge property and it is not clear how to construct ZK-PA from 
standard SPHFs directly. Benhamouda et al. [4] defined the notion of trapdoor 
SPHFs (TSPHFs) as an extension of SPHF in which one can verify the cor- 
rectness of the projection key generation. More in details, a TSPHF comes with 
three additional algorithms (tsetup, verHP, thash). tsetup outputs a CRS crs, with 
a trapdoor T. The trapdoor 7 can be used by thash to compute the hash value of 
any statement x (only by knowing public hp). The algorithm verHP takes in a key 
hp and the CRS crs,, and outputs 1 if hp is a valid projection key. The properties 
a TSPHF must verify are the same as SPHF, except the smoothness property is 
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Setup(1>): Run (crs+, T) — tsetup(1*) and return (crs+, T). 
— Chall(Ipar, x): 
e Run hk ~¢hashkg(Ipar) and hp — projkg(Ipar; hk, x). 
e Compute H < hash(Ipar; hk, x). 
e Return (c, 6) := (hp, H). 
— Resp(Ipar, x, w,c): For c := hp, check if verHP(crs-,hp) = 1, then run 
pH < projhash(Ipar; hp, x, w) and return a := pH. 
Sim(lpar, x, 7, c): Parse c := hp and return tH + thash(Ipar; hp, x, T). 


Fig. 2. ZK-PA Iazkpa from TSPHFs. 


no longer statistical but computational as hp should now contain enough infor- 
mation to compute the hash of any statement. Moreover, a TSPHF should satisfy 
zero-knowledge property which informally states that for any statement x with 
valid witness w, the projected hash value pH — projhash(Ipar; hp, x, w) should be 
indistinguishable from the trapdoor hash value tH <— thash(Ipar; hp, x, 7). For a 
more formal definition of TSPHFs, We refer the reader to [4]. 

In this section, we show the connection between ZK-PAs and TSPHFs [6], 
namely we construct ZK-PA for a relation Ripar given a TSPHF for the same rela- 
tion. Different from [14], the relation Ripa, here is identical. This is because [14] 
considers the connection for the knowledge-sound PAs (and extractable SPHFs) 
whereas here we only consider soundness and (computational) smoothness. As 
a direct result of this, we obtain ZK-PA for all languages that admit TSPHFs 
(i.e., algebraic languages). 


3.2 Construction of ZK-PA from TSPHFs 


Let I/tspnp = (setup, tsetup, hashkg, projkg, hash, projhash, verHP, thash) be a 
TSPHF for Lipar. The construction of jump, = (Setup, Chall, Resp, Sim) in the 
CRS model is given in Fig. 2. Due to space constraints, the proof of the next 
theorem is deferred to the full version [22]. 


Theorem 1. If the TSPHF IItspn¢ is correct, (computationally) smooth and 
zero-knowledge, then Izkpa in Fig. 2 is secure and zero-knowledge. 


Instantiation and Efficiency Evaluation. Given the above connection, one 
can now obtain a secure ZK-PA for any algebraic language Lipa with Ipar = 
(M, 90) (see Eq. 1) in the bilinear setting based on the efficient construction of 
TSPHF in [6] (see full version [22] for details). The resulting ZK-PA is sound 
under the DDH assumption in G2 (See [6], Appendix E.3 for the security proof). 
To evaluate efficiency, we note that compared to the original construction of 
ZK-PA in [14], the above construction is more efficient as it only has one more 
group element in the challenge c (compared to the non-zk construction of PA), 
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whereas the idea of adding a NIZK proof for the well-formedness of c in [14] has 
at least a linear overhead in the size of ct. 


Remark 1. Recently, Abdolmaleki et al. [1] show how one can use non-blackbox 
techniques to construct a subversion-resistant variant of smooth projective hash 
functions. Following a similar approach directly yields the construction of ZK-PA 
in the plain model, thus giving another way to circumvent the [18] impossibil- 
ity using non-blackbox techniques. The recent work of [7] also construct ZK-PA 
for all NP. Their construction, however, mainly focuses on a feasibility result 
rather than efficiency, and requires strong assumptions such as indistinguisha- 
bility obfuscation. Moreover, while the zero-knowledge simulator in their con- 
struction is non-black-box which is inherent in the plain model, we rather focus 
on more efficient constructions in the CRS model. 


4 Witness-Indistinguishable Predictable Arguments 


Due to a classical impossibility result [18], a prerequisite for constructing 2- 
message ZK proof systems based on black-box techniques is a common reference 
string (CRS)—a string generated by a trusted party to which both prover and 
verifier have access. Requiring such a trust model may however be overkill for 
some applications where a weaker notion of privacy such as witness indistin- 
guishability (WI) is sufficient. Weaker than ZK property, this property states 
that for any two possible witnesses w1, w2, an adversary cannot distinguish proofs 
generated by wı from the proofs generated by w2. Given a PA Mpa = (Chall, Resp) 
for an NP language £, we show how to construct a WI-PA Iwipa = (Chall’, Resp’) 
for the same language. At first it may seem that regardless of which witness is 
used by the prover when running Resp, it has the same functionality since all 
the witnesses return the same (predicted) answer. This argument is however not 
true: while for an honestly-generate challenge, Resp behaves the same regardless 
of which valid witness is used, this might not be true for maliciously generated 
challenges. To circumvent this issue, the key idea is to require the verifier to 
prove that the challenge is indeed generated from a proper run of Chall with 
some randomness. This should be done without breaking the soundness, mean- 
ing the secret coins of the verifier should be kept hidden from the prover. To 
this end, we will use a NIWI proof system as an ingredient, through which the 
verifier proves the following statement: there exists a random string a, such that 
c = Chall(Ipar, x; œ). The prover first checks if the NIWI proof verifies and if so, 
computes the predicted answer as before. 

Since we use a NIWI proof system in the plain model as an ingredient of our 
construction, below we recall the definition of WI for such proof systems. We 
note that a construction of NIWI in the plain model for all NP languages and 
based on standard assumptions is presented in [21]. 


* Here we are assuming that the security of the construction should remain under 
standard and falsifiable assumptions as it is easy to construct succinct NIZKs based 
on non-falsifiable assumptions. 
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— Chall'(Ipar, x): the verifier computes (c,b) — Chall(lpar,x; œ) and sends 
the challenge c along with a NIWI proof m for the existence of a such 
that c is the first output of Chall(Ipar, x; œ). 

— Resp’ (Ipar, x, w, c, 7): the prover first checks the NIWI proof m. If m ver- 
ifies, the prover computes a — Resp(Ipar, x, w, c) and returns a. 


Fig. 3. Construction of WI-PA 


Definition 3. Let Tniwi = (Phiwi, Vniwi) be a non-interactive proof system for a 
language Lipar. We say that Hniwi 1s computationally witness-indistinguishable if 
for all (x,w1,W2) such that (x,w1) E€ Ripar and (x,W2) E€ Ripar, and for all PPT 
adversaries A, 


Pr [ A(T) =1: vm < Priwi(lpar, x, wi)] x, Pr [ A(T) =1 : m + Priwi(lpar, x, w2)] 


4.1 Our Construction 


Let Ipa = (Chall, Resp) be a predictable argument for language Lipar, and Mniwi 
be a non-interactive computational WI proof system in the plain model for the 
language of statements c for which there exists a@ such that c = Chall(Ipar, x; a). 
We construct a WI-PA Iwipa = (Chall’, Resp’) for Lipar as depicted in Fig. 3. The 
completeness of the construction follows straightforwardly from the completeness 
of Ipa- We prove soundness and WI in the next theorem. 


Theorem 2. The construction in Fig. 8 is a statistical witness-indistinguishable 
predictable argument in the plain model. 


Proof. Soundness. Let x ¢ Lipar and A be an efficient adversary that breaks 
soundness of [Jwipa by convincing the honest verifier V with non-negligible prob- 
ability £. I.e., e(n) > an) for some polynomial p and for infinitely many n’s. 
Denoting this set by N, we restrict ourselves to n € N from now on. This 
indicates that there exists a first message c from V on which A convinces V 
with probability at least ¢. Fix this challenge c and the corresponding answer b 


computed by V. Define a set S as follows: 


S= fo : Pr [A(lpar, x, c) = b|(c, b) — Chall(lpar,x)] > 


NI® 


Fix some bọ € S and define Sọ C S as 
So = fo € S$: Pr [A(Ipar, x, c) = b|(c, bo) — Chall(Ipar,x)] > =}. 


Since bo € S, we have that Pr [.A(Ipar,x,c) = bo|(c,bo) — Chall(Ipar,x)] > § 


2 
and therefore |So|- $ < 1 — $, which consequently implies that |So| < +. Now, 
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the fact that £ is non-negligible indicates that So is bounded by a polynomial. 
On the other hand, we have that Pr[b € S] > 5, and that S is exponential 
in the security parameter A. This means that there should exist b} € S such 
that bı ¢ So. We now construct a non-uniform PPT adversary B that breaks 
the witness-indistinguishability of niwi. Let aux = (ao, a1, bo, b1) be such that 
(c, bo) — Chall(lpar, x; ao) and (c, b1) — Chall(Ipar, x; a1). Given aux as advice, B 
proceeds as follows: it first returns (c, (bo, ao), (b1, a1)) to the WI challenger and 
obtains a proof 7. Next, B calls A on input (7,c) and returns i when it receives 
bi from A. Note that for m that is computed using (ro, bo), A returns bı with 
probability at most $, whereas for 7 computed by (11,61), A returns bı with 
probability at least 5. This makes B a successful adversary in breaking WI. 

WI. Let V* be an adversary against WI property of Iwipa and (x,wi,W2) be 
such that (x,w1),(x,W2) E€ Ripar- It follows from (statistical) soundness of the 
NIWI proof that V*’s first message is computed correctly with overwhelming 
probability. This together with predictability of the argument indicates that the 
answer from the prover is unique regardless of which witness is used and thus 
completes the proof. 


5 Commit-and-Prove Predictable Arguments 


We study a relaxed notion of predictability in interactive argument systems 
which consists of two phases: In phase 1 (commitment phase), the prover com- 
mits to its witness once for all and sends the commitment to the verifier. In 
phase 2 (challenge-response phase), the prover and the verifier engage in a pre- 
dictable argument protocol, where the verifier’s challenges may depend on the 
commitment in such a way that the prover’s responses can be predicted by the 
verifier. The type of relations we consider are of the following form: a statement 
x = (cm,C,y) and a witness (w,d) are in the relation (i.e., (x,(w,d)) € R) iff 
“cm commits to w by randomness d, and C(w) = y”. Here C is a circuit in some 
polynomial-size circuit class C and y is the expected output of the circuit. 


Definition 4 (Commit-and-Prove Predictable Arguments). Let C be a 
class of polynomial-sized circuits. A commit-and-prove predictable argument for 
C is a multi-round protocol (between a prover P and a verifier V) which consists 
of three algorithms Ieppa = (Commit, Chall, Resp): 


Commitment phase (executed by P): cm — Commit(w;d) on input a value 
w, generates a commitment cm by using some randomness d. 
Interaction phase. Each round proceeds as follows: 
— (Executed by V): (c,b) — Chall(cm, C, y) on input a statement (cm, C, y) 
such that C € C, generates a challenge c and a predicted answer b. 
— (Executed by P): a — Resp(cm, C,w,d,c) on input a commitment cm, 
a circuit C € C, the committed value w, the randomness d, returns a 
response a. 
V accepts the proof iff a = b in all rounds. 
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We call a CPPA as a p-round CPPA if the interaction phase consists of p 
rounds. A CPPA should satisfy completeness and soundness as defined below: 


(Perfect) Completeness. An honest prover with a statement x = (cm, C, y) 
and witness (w,d) such that (w,d) opens the commitment (i.e., cm = 
Commit(w;d)), and C(w) = y can always convince the verifier with over- 
whelming probability. More precisely, a CPPA has perfect completeness if for 
all A € N, for all C € C, and for all (x = (cm, C, y), (w,d)) ER 


Pr [a =b : (c,b) — Chall(cm, C, y); a — Resp(cm, C, w, d, c) | =1 
e-Soundness. For all A € N, and all (stateful) PPT adversaries A = (Aj, A2) 


m a=bA  (w,d,C,y) = Aı(1ò); cm — Commit(w; d) 7 
| C(w) Zy (cb) — Chall(cm, C,y);a — Aa(w,d,C y, e) >S 

We call a CPPA sound if e € negl(A). A CPPA is secure if it is correct and 
sound. Similar to PAs, one can show that CPPAs can also be made extremely 
laconic in terms of both round complexity and proof complexity. Specifically, the 
same technique in [14] can be used to collapse any p-round CPPA into a single 
round CPPA. 

In this work, we only focus on CPPA protocols with the zero-knowledge prop- 
erty. A CPPA is zero-knowledge (ZK-CPPA) if there exists a PPT algorithm Sim 
that computes the predicted answer of any valid statement x without knowing 
the random coins used by Chall() nor any witness for x, but only knowing the 
challenge c. Since our construction of ZK-CPPA is in the non-programmable 
random oracle (NPRO) model, we define this property in this model. 


Definition 5 (Zero-knowledge CPPA in the NPRO model). We say 
that a CPPA (Commit, Chall, Resp) for a class of circuits C satisfies the zero- 
knowledge property in the NPRO model if for any PPT adversary A, there exists 
a PPT simulator Sim such that for all PPT distinguisher D, for all (x,w) E R, 
and all auxiliary inputs z € {0,1}*, we have: 


max | Pr[D" (x, 7,2) =1 : r — (PM (x,w) S AM (x, 2))] 
— Pr[D¥ (x r, 2) = 1 : 7 = Sim” (x, 2)]] < negl( |x|) 


Where P and A are respectively the prover and the (malicious) verifier run- 
ning the CPPA protocol, and P¥ (x,w) S A” (x, z) denotes the random variable 
corresponding to a protocol transcript on input (x, w). 


We now give our construction of ZK-CPPA for all polynomial-size circuits P 
in the NPRO model. The construction is similar to the three-round ZK protocol 
of [15], with the difference that the first message in our protocol is reusable. 
Moreover, here we only focus on providing ZK property as defined above, whereas 
the construction of [15] shows ZK in the UC model. 
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5.1 ZK-CPPA Based on Garbled Circuits and Oblivious Transfer 


Let GC = (Garble, Encode, Eval, Decode, Verify) be a garbled circuit with correct- 
ness, authenticity, and verifiability, and Hort = Cig He alin) be a sender- 
extractable oblivious transfer protocol that realizes For. At a high level, the 
construction proceeds as follows. The prover P with witness w = (wi,...,Wn) E 
{0,1} plays the role of the receiver in n instances of the OT protocol and com- 
mits to its witness bits by providing w; as input to the j-th instance of ort. Let 
m? — I1G;(w;;1r}*) and define cm and d as the set of {mF }jejn] and {r}*} je{nj, 
respectively. For a circuit-value pair (C,y) of the verifier’s choice, let C be a 
circuit that realizes the following relation R: R(x = (cm, C, y), (w,d)) = 1 iff 
(w,d) open cm and C(w) = y. The verifier V constructs a GC C for C and sends 
it along with the second message of the OT as the challenge c. Moreover, V sets 
the predicted answer b to be the output 1-key k! of the final gate in the circuit. 
Now, P with a valid witness (w,d) evaluates C and sends the obtained garbled 
output a = k! as the predicted answer. It is not hard to see that this construc- 
tion results in a CPPA. To additionally ensure ZK property, we follow the same 
approach as [15] by enforcing V to also provide a ciphertext ct = H(k') @r, 
where H is a random oracle and r is the randomness used by V to produce 
the second message of the OT. When P computes k!, she first recovers r and 
then computes all the labels by executing the extractor Ext guaranteed by the 
sender-extractability property. Finally, P verifies if the garbled circuit has been 
constructed correctly and if so, she sends the predicted answer a = k! to V. The 
resulting protocol Teppa is described in Fig. 4. The proof idea is similar in spirit 
to the proof of Theorem 4.2 in [15]. We give a proof sketch here. 


Theorem 3. Let GC be a correct, authentic, and verifiable garbling scheme, 
Ior be a sender-extractable OT protocol that securely implements Fort, and H 
be a random oracle. The protocol Heppa in Fig. 4 is a secure and zero-knowledge 
commit-and-prove predictable argument as defined in Definitions 4 and 5. 


Proof (Sketch). Completeness follows straightforwardly by the correctness prop- 
erty of the underlying OT and the garbling scheme. 

In order to show soundness, let us consider a PPT adversary A = (A1, A2) 
and assume that (w, d, C, y) is a tuple returned by A; that corresponds to a false 
statement. That is, x = (cm, C, y), where cm = Commit(w; d) and C(w) 4 y. We 
show that for (c,b) — Chall(cm, C, y), if A2 having c can compute the predicted 
answer b, then one can either break the sender security of the underlying OT 
protocol, or the authenticity of the garbling scheme. To show this reduction, we 
first note that b is the correct label kt. Now, given that C(w) Æ y, there can be 
two cases where As can output kt with non-negligible probability. In the first 
case, Az outputs k! by the ability of computing invalid labels k that does 
not correspond to its committed value. It is not hard to see that such As can be 
used to break OT sender security. The reduction B proceeds as follows: 6 first 
computes a garbled circuit C and sends the labels to the OT challenger. Next, it 
extracts A ’s input w and forwards it as the choice bits of the receiver. The OT 
challenger computes the sender’s message either by invoking a real sender, or by 


556 H. Khoshakhlagh 


invoking the simulator, and sends it to the reduction who further forwards to 
Az together with C and a random T. Now, since Az can compute kt only in the 
real execution of Ior, a successful Ag with non-negligible probability € implies 
that B can distinguish the real and simulated view of the OT protocol with 
probability at least e. In the second case, where A> does not use invalid labels 
but computes the correct kt, it is straightforward to construct an adversary B 
that breaks the authenticity of the underlying garbling scheme by forging k! for 
a given garbled circuit C. 

We now argue that Ieppa is zero-knowledge in the NPRO model. Let V* 
be a PPT adversary against the ZK property. We construct an efficient sim- 
ulator Sim that simulates the protocol as follows. Sim observes V*’s calls to 
the random oracle, so that for every query H(u) made by V*, Sim records 
u in a set L. To simulate the first message, Sim invokes the simulator of 
Ior for the corrupt receiver. Upon receiving V*’s message c, Sim parses c as 
(C, {m?}je[n],T) and defines the set R = {H(u) © T|u € L}. For any r € RÈ 


parsed as r = ry||...||Tn, Sim computes (k?, kj) — Ext(m?, m3, rF) for j € [n] 
and checks if Verify(C, C, {k9 , kj }je{n]) = 1. If there exists such r € R, the sim- 


ulator sends Y to V*, where Y € L is so that r = H(Y) @ T. Otherwise, Sim 
aborts the protocol. The output of the simulator is perfectly indistinguishable 
from the real distribution and this completes the proof. 


6 Applications: Witness Encryption with Decryptor 
Privacy 


Besides being a notion of theoretical interest, we also show the applications 
of (commit-and-prove) predictable arguments with zero-knowledge or witness- 
indistinguishability property in the context of witness encryption. Witness 
encryption (WE) is a powerful notion of encryption introduced by Garg et 
al. [17]. A WE scheme for an NP relation Ripa allows to encrypt a message 
m with respect to a statement x as ct — WE.Enc(Ipar, m,x). The ciphertext can 
be decrypted as m — WE.Dec(ct, w) for any w such that (x,w) € Ripar- Security 
guarantees that no adversary should learn any non-trivial information about m 
if x Z Lipar, Where Lipa is the language corresponding to Ripa. More formally, 
we say that a WE is secure if it is complete and sound as defined below: 


— Completeness. A WE has completeness if for all A € N, for all Ripar € RG), 
for all m, and for all (x,w) € Ripar 


Pr[Dec(Enc(Ipar, x,m),w) = m] > 1 — negl (A) 


If the probability is 1, we say WE is perfectly complete. 
— Soundness. A WE has soundness if for all A € N and all PPT adversaries 
A, there exists a negligible function negl(A) such that for any mo, mı 


Pr Ripar —$ RG); Xe A(lpar); bes {0, 1}; ‘ b= Bi Ax g Lipar ZA negl(A) 


ct — Enc(Ipar, x, my); b’ — A(lpar, x, ct) ` A|mo| = |m1| 
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— Oracles and Primitives: A correct, authentic, and verifiable gar- 
bling scheme GC = (Garble, Encode, Eval, Decode), a sender-extractable 
2-round OT Mor, and a hash function H : {0,1}* > {0,1}P°%) mod- 
eled as a random oracle. 

— P’s private input: w € {0,1}”, where n = poly(A). 

— Commitment Phase: P plays the role of the receiver in n instances of 
Ior and computes (cm, d) as follows: 

1. Sample uniformly random r? from {0,1}, and compute m? — 
18, (wy; r?) for j € [n]. 
2. Define cm = {mi} jen] and d = {Pye thi: 

— Common inputs: A security parameter À, and a statement x = 
(cm, C,y), where C is a polynomial-size circuit. 

— Challenge: Let C be a circuit that realizes the following relation R: 
R(x = (cm, C, y), (w,d)) = 1 iff (w,d) opens cm and C(w) = y. V plays 
the role of the sender in n instances of Jor and computes a pair (c, b) 
of challenge-predicted answer as follows: 

1. Compute (C,e,d) — Garble(1*,C), where e := {k9, ki }je{nj, and 
d := (k?, kt). 
2. For j € [n], sample uniformly random r3 from {0,1}, and compute 
m3 = ASCH Amr) 
3. Compute T = H(k!) r5, where r° = r$ ||... ||r8. 
4. Define c = (C, {m$ }jejn]: T) and b = kt, and send c to P. 
— Response: P proceeds as follows: 


1. Execute k; = = I1§;(m?,w;,r?) for j € [n]. 

2. Execute Y = Eval(C {k Jjeln): 

3. Recover r = H(Y) T, and parse r° = r}||... es 

4. Reconstruct sender’s inputs (k?, kj) — Ext(m?, m3, r?) for j € [n]. 


Abort if the extractor fails for some j € [n]. 
5. Send the predicted answer a = Y if Verify(C,C , {k9, ki}je{n) = = l; 
and abort otherwise. 
— V accepts the proof iff a = b. 


Fig. 4. ZK-CPPA Ieppa based on GC and OT 


While being a very powerful notion, existing constructions of WE are not 
satisfactory, as they are either based on strong assumptions such as indistin- 
guishability obfuscation and multilinear maps [11,16,17,20], or based on new 
and unexplored algebraic structures [2]. 

As noted in [14], predictable arguments imply witness encryption as one can 
encrypt a bit m by generating a challenge-answer pair (c,b) for the PA and 
define the ciphertext as (c,b @ m). Viceversa, a PA can be constructed from 
WE by encrypting a random bit m and then asking the prover to return m. 


558 H. Khoshakhlagh 


E with private inputs (x,m D with private inputs w 
, P P 


Output: L Output: m if (x, w) € Ripar 


Fig. 5. Functionality of a WE scheme with decryptor privacy for a relation Ripar 


Furthermore, it is not hard to show that commit-and-prove predictable argu- 
ments are also equivalent to a variant of witness encryption studied in [5,8]. 
It is therefore interesting to see the applications of predictable arguments with 
privacy in the context of witness encryption. While the standard definition of 
witness encryption requires the above properties, for some applications explained 
below, we may require some level of privacy for the decryptor as well. In other 
words, we may ask for a WE scheme that mimics the following functionality 
(See Fig.5): the functionality is parameterized by a message space M and an 
NP relation Ripa. An encryptor E with private inputs m € M and bitstring x 
interacts with a decryptor D with private input w, at the end of which D out- 
puts m iff (x,w) € R. Note that this is different from standard WE wherein 
the decryptor aims to obtain the message internally without revealing it to the 
environment. Here instead, the decrypted message is revealed to the encryp- 
tor which may break the privacy of the decryptor. Since the encryptor knows 
the plaintext when running the encryption algorithm, one may wonder how the 
decrypted message can leak some information about the decryptor’s witness. In 
the full version [22], we provide an example to illustrate this scenario. 


6.1 Application: Dark Pools 


We now justify our model of WE with decryptor privacy. In our model, we 
are assuming that the decryptor D sends back the decrypted message to the 
encryptor E whereas in all previous works, the communication is non-interactive 
(i.e., “one-shot”) in the sense that there is only one message ct from E to D. 
Our motivating applications are dark pools and over-the-counter markets. Dark 
pools are anonymized trading platforms that allow parties to place invisible 
orders such that each party can only know their own orders. Such pools allow 
the investors to communicate only to those whose transaction conditions satisfy 
some constraints. At the same time, they should also guarantee that investors 
do not learn any information about traders’ secret information. 

In a recent work, Ngo et al. [23] introduced a new cryptographic primitive 
called Witness Key Agreement (WKA) as a tool to make this possible. In the 
dark pool scenario, a WKA allows a party E to securely agree on a secret key with 
another party D who owns a secret witness satisfying some arithmetic relation. 
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More precisely, in the presence of a public bulletin board or a public blockchain, 
a WKA addresses the following problem: given n parties who have committed 
to their secret inputs w, and published the commitments cm anonymously on 
the blockchain, an investor E wants to agree on a key k with any party whose 
committed secret w satisfies some relation; i.e., C(w) = y, where C is an arbitrary 
arithmetic circuit specified by E. Similar to NP relations defined in Sect.5, one 
can set x = (cm,C,y) and let R be defined such that R(x, (w,d)) = 1 iff cm 
commits to w (with decommitment d) and C(w) = y. Once the secret key k 
is recovered by the legitimate party (i.e., any party with valid witness (w, d) 
such that R(x, (d,w)) = 1), they together with the investor can secure their 
communication from any external party by using k. 

We now demonstrate how our construction of ZK-CPPA can be used as a 
drop-in replacement for a witness key agreement. At a high level, the protocol 
proceeds as follows. All parties first commit to their secret values w via cm — 
Commit(w; d), and publish the resulting commitments cm. Later, an investor who 
wish to communicate only with participants whose secret satisfy C(w) = y (for 
some arbitrarily chosen circuit C and value y) considers the following relation: 
R(x = (cm, Cy), (w,d)) = 1 iff C(w) = y, and cm = Commit(w;d). Let us 
assume that x; = (cm;,C,y) is the statement corresponding to party i. The 
investor now encrypts the secret key k under all such statements x;°. It is not 
hard to see that only the prover with the valid witness (w;,d;) can decrypt the 
ciphertext. Moreover, since the construction is ZK, the decrypted message k says 
nothing about (w;,d;), even if the ciphertext is generated maliciously. 


Efficiency and Comparison with [23]. In [23], the authors propose a WKA 
construction based on a type of Succinct Zero-Knowledge Non-Interactive Argu- 
ment of Knowledge Proof System (zk-SNARK) from non-interactive linear proof 
systems (NILP), where the verifier is designated. The construction at a high- 
level is as follows. A designated verifier—playing the role of the investor— first 
broadcasts a CRS as a challenge for the relation R of interest. Next, a prover 
publishes a partial zk-SNARK proof as a response for the committed value that 
satisfies R. Finally, the verifier using the partial proof can derive a shared secret 
key with the prover. 

We now compare our proposed construction for WKA with that of [23]. In 
contrast to our scheme which is ZK, the construction of [23] only provides honest- 
verifier ZK. Moreover, the WKA in [23] requires an expensive trusted setup 
which should be invoked every time an investor E; asks for the preprocessing 
of a new CRS corresponding to the relation R; of E,’s interest. On the other 
hand, the major downside of our scheme is that the size of the ciphertext grows 
linearly with the number of parties in the system as the investor should encrypt 
the message under every existing commitment in the system, whereas the size 
of ciphertext in [23] is independent of the number of parties. This suggests that 
there might well be a trade-off between the size of the ciphertext and the required 


5 We again emphasize that we see the notions of PA and WE (and their “commit- 
and-prove” variants) interchangeably here, as the implication from one to another 
is straightforward and shown in [14]. 
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number of trusted setups and our construction performs better when the number 
of parties is small. 


7 Conclusion and Open Problems 


In this work, we study predictable arguments with privacy properties and show 
their application to the construction of witness encryption schemes that require 
decryptor’s privacy. We also introduce CPPAs that provide a weakening of pre- 
dictability and give an efficient construction using garbled circuits techniques. 
While we construct CPPA in the random oracle model, an interesting open 
question is whether PA also exists in this model. Another theoretical question 
left open by our work is to show if WI deterministic-prover argument (WI-DA) 
implies WI-PA. While zero-knowledge deterministic-prover argument (ZK-DA) 
was characterized in a recent work by Bitansky and Choudhuri in [7], where 
they showed that ZK-DA implies ZK-PA, it would be interesting to do the same 
characterization for the weaker notion of witness indistinguishability. Finally, 
finding more applications for CPPA would be an interesting question. 
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Abstract. Secure multiparty computation (MPC) has recently been 
increasingly adopted to secure cryptographic keys in enterprises, cloud 
infrastructure, and cryptocurrency and blockchain-related settings such 
as wallets and exchanges. Using MPC in blockchains and other distributed 
systems highlights the need to consider dynamic settings. In such dynamic 
settings, parties, and potentially even parameters of underlying secret 
sharing and corruption tolerance thresholds of sub-protocols, may change 
over the lifetime of the protocol. In particular, stronger threat models — in 
which mobile adversaries control a changing set of parties (up to t out of 
n involved parties at any instant), and may eventually corrupt all n par- 
ties over the course of a protocol’s execution — are becoming increasingly 
important for such real world deployments; secure protocols designed for 
such models are known as Proactive MPC (PMPC). 

In this work, we construct the first efficient PMPC protocol for dynamic 
groups (where the set of parties changes over time) secure against a dishon- 
est majority of parties. Our PMPC protocol only requires O(n”) (amor- 
tized) communication per secret, compared to existing PMPC protocols 
that require O(n*) and only consider static groups with dishonest majori- 
ties. At the core of our PMPC protocol is a new efficient technique to per- 
form multiplication of secret shared data (shared using a bivariate scheme) 
with O(n./n) communication with security against a dishonest majority 
without requiring pre-computation. We also develop a new efficient bivari- 
ate batched proactive secret sharing (PSS) protocol for dishonest majori- 
ties, which may be of independent interest. This protocol enables multiple 
dealers to contribute different secrets that are efficiently shared together 
in one batch; previous batched PSS schemes required all secrets to come 
from a single dealer. 
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1 Introduction 


Dynamic MPC settings, where parties and parameters of the underlying secret 
sharing and sub-protocols can change during the execution of the protocol, 
have attracted a lot of attention in the past years. Some of these settings con- 
sider very powerful adversaries who can compromise dishonest majorities, i.e., 
active/malicious or passive/semi-honest parties that may add up to a majority. 
Additionally, for long-lived computation and better security guarantees, stronger 
threat models in which mobile adversaries [7,10] control a changing set of par- 
ties (up to t out of the n parties at any instant), and may eventually corrupt 
all n parties over the course of a protocol’s execution or lifetime of confidential 
inputs, are becoming increasingly attractive in the real world deployments of 
MPC. MPC protocols withstanding such mobile adversaries are typically called 
Proactive MPC (PMPC) [7,10]. 


Table 1. Overview of features and limitations of proactive secret sharing (PSS) and 
proactive MPC (PMPC) protocols. 


Type Batching Dynamic Dishonest Fair Subprotocols 
Groups Majority Reconstruct Communication 

(amortized) 
[1] PMPC v x x x O(1) 
[2] PSS/PMPC v v x x O(1) 
[3] PSS only x x v v O(n*) 
[5] PMPC x x v v O(n*) 
[4] PSS only Vv s s Vv O(n?) 
This work PMPC y s y s O(n?) 


Related work in proactive secret sharing (PSS) and PMPC, and the different 
settings considered, is listed in Table 1. In the honest majority setting, the early 
work of Baron, Eldefrawy, Lampkins, and Ostrovsky [1] introduces the frame- 
work to construct PMPC from PSS by computing the circuit layer by layer 
and (proactively) redistributing the parties’ secret shares after each layer. The 
PMPC protocol handles batching (i.e., the secret sharing contains many secrets 
operated on in a coefficient-wise manner) and static groups in the honest major- 
ity setting. In a follow-up work, Baron, Eldefrawy, Lampkins, and Ostrovsky [2] 
consider then the setting of dynamic groups. The study of PSS and PMPC in 
the dishonest majority setting starts with the work of Dolev, Eldefrawy, Lamp- 
kins, Ostrovsky, and Yung [3], in which they present a PSS scheme (without 
batching) for static groups. Feasibility of constructing PMPC withstanding a 
dishonest majority of parties was then demonstrated by Eldefrawy, Ostrovsky, 
Park, and Yung in [5]. The subprotocols in the last two works have commu- 
nication complexity O(n*), which significantly hinders their practicality. The 
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schemes from [3] and [5] have the additional property of ensuring fair recon- 
struction with the gradual sharing model from Hirt, Lucas and Maurer [8]. The 
PSS scheme from [3] was later revisited by Eldefrawy, Lepoint, and Leroux [4]: 
they present an efficient PSS scheme with batching, fair reconstruct with no 
complexity overhead and dynamic groups, with security against mixed adver- 
saries that can compromise a majority of parties, but leave as future work to 
extend it to a full PMPC protocol. This naturally brings us to formulate the 
following open problem: 


Can we develop a communication-efficient PMPC protocol that han- 
dles batching, with amortized communication O(n?) or less, for dynamic 
groups, and with security against mixed adversaries that can compromise 
a majority of parties? 


1.1 Contributions 


In this work, we affirmatively answer this question by constructing an efficient 
PMPC protocol with four key properties: (i) batching, (ii) suitability for dynamic 
groups (iii) security against a majority of active/malicious or passive/semi- 
honest corruptions, and allowing (iv) fair reconstruct with no complexity over- 
head in the mixed adversarial setting proposed by Hirt, Lucas, and Maurer [8]. 
Our protocol achieves computational security and the efficiency is enabled by 
only requiring O(n?) (amortized) communication per secret when batching O(n) 
secrets. Our communication model assumes a broadcast channel and pairwise 
secure channels. Concretely, we make the following contributions: 


1. We develop the first efficient fair PMPC for dishonest majorities and dynamic 
groups, with O(n?) (amortized with batches of size 0 = n—2) communication 
(in both broadcast and secure channels). Our new PMPC protocol protects 
secrecy of the inputs when active and passive corruptions are less than n — 
3 — vl at any time of the protocol. Additionally, the computation is fair 
if the number of active corruption is less than k and the number of passive 
corruption is less than min(n— k — v€, 2(n— k) — £) during the reconstruction, 
for 1 < k < n/3 (cf. Theorem 1). 

2. We develop a new efficient bivariate batched proactive secret sharing (PSS) 
Share protocol for dishonest majorities that enables multiple dealers to con- 
tribute different secrets that are shared together in one batch. Previous 
batched PSS schemes in the dishonest majority setting required all secrets 
to come from one dealer. 

3. At the core of the protocol is a new efficient sub-protocol for multiplying 
secret-shared data (using a bivariate sharing of degree d = n— 2 for batches of 
size d) with O(nyn) amortized communication, and secure when the number 
of corruptions (either active or passive) is less than n — 3 — vn — 2 without 
requiring pre-computation (cf. Theorem 2). The techniques developed to this 
effect might be of independent interest. 
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1.2 Technical Overview 


Previous PMPC protocols [5], proven secure in the dishonest majority setting 
and for static groups, builds on top of the proactive secret sharing scheme 
of [3] by augmenting it with protocols for adding and multiplying shares to per- 
form computation on the secret shares following the same (arithmetic) PMPC 
blueprint as proposed in [1]. While additions are computed locally, multiplica- 
tions require using the standard GMW MPC protocol [6], so as to obtain a 
proactive secret sharing of the multiplication of two secrets. The asymptotic 
communication efficiency of [5] is the same as that of [3], ie. O(n*). 

Similarly, we develop our new PMPC protocol for dynamic groups with dis- 
honest majorities on top of a recent PSS protocol [4] for the dishonest majority 
setting. However, the PSS of [4] differs significantly from that of [3], and extend- 
ing [4] to an efficient PMPC protocol with (amortized) communication O(n?) 
requires care and new techniques; the rest of this section summarizes the main 
intuition behind our construction. 

Let us briefly recall the PSS construction of [4]. Secrets s,,...,s¢ are secret 
shared among n participants P,,...,P, by a dealer, which construct a bivariate 
polynomial g such that: 


— g(a,-) and g(-,y) are of degree at most d < n — 1; 
— the secrets are embedded as g((;, 3;) = si for distinct (;’s; 
— the secret share of party P; is the polynomial g(q;,-), for distinct a;’s. 


This sharing naturally supports additions: party P; will be able to locally add 


its secret shares g(a;,-) (where g is the bivariate polynomial for s1,...,5¢) and 
g'(ai,-) (where g’ is the bivariate polynomial for s/,...,s)) to obtain a secret 
sharing of the sum of the secrets (s1 + s4,...,5¢+ 8%). 


Contribution 1: Efficient Multi-dealer Batched Sharing Protocol. We point here a 
subtle feature which was not present in [3,5] and that usually does not manifest 
in the standard PSS functionality and was also lacking from [4]. Multi-dealer 
batched sharing allows us to use the batching techniques (and the resulting 
improvement in communication complexity) for computations where each partic- 
ipant has O(1) secrets, something impossible with single-dealer sharing protocols 
because each participant has to do at least one sharing for its secrets. We obtain 
this multi-dealer sharing with a simple adaptation of techniques used in [4]. It 
suffices that all the dealers generate a classical Shamir sharing for each of their 
secrets and then a bivariate sharing for the whole batch is obtained by combining 
these univariate polynomials into a bivariate polynomial. Additionally, perform- 
ing addition and multiplication on batch of secrets require permuting the secrets 
between consecutive layers to align them. Thus, the underlying PSS needs to be 
secure even when some of the shared secrets have been leaked to the adversary. 
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Contribution 2: Efficient Multiplication of Shared Secrets for Groups with Dis- 
honest Majorities. This protocol is at the core of our contributions enabling 
the construction of our new communication-efficient PMPC protocol. In fact, 
our Mult protocol achieves even better than the minimal requirement of O(n?) 
with an amortized communication complexity of O(n,/n) against up to approxi- 
mately n — y/n actively corrupted participant. This improvement of y/n over the 
standard quadratic complexity for multiplication without precomputation in the 
dishonest majority setting may be of independent interest. It has the following 
blueprint: 


1. The participants have shares for two “bivariate” secret sharings g, g’ contain- 
ing both £ secrets to be multiplied together. First, g is transformed into £ 2 
“univariate” secret sharings f1(a;),..., fia (ai), where each fj is of degree 


d and contains ¢? secrets (and similarly for g’). This step is done by each 
party generating a random polynomial and using the Lagrange interpolation 
formula to embed the secrets from g. 

2. Then, 02 “blinding” bivariate polynomials h; of degree d such that 
h;(Gi,8:) = 0, are generated. This step follows the classical approach of 
generating blinding polynomials: each party generates a random polynomial 
evaluating in 0 in the (;’s. 

3. Next, 2 bivariate polynomials gy = f(x) f(y) + hj(x, y) are computed, and 
party P; learns g} (a;, +). Note that hj “blinds” the product of the polynomials 
fj and f; (except in (6i, 6i) where gf will evaluate into the product of the 
secrets. This step uses the secure multiplication protocol introduced in [9] in 
the context of threshold ECDSA. 

4. Against active corruptions, correctness of the computation is verified using 
additively homomorphic commitments. To verify the correctness of the mul- 
tiplication operations involved in the computation of g}, the participants 
reveal g*(randj,-) for some random values rand;. This reduces the security 
threshold by one while preventing an adversary to deviate from the protocol 
undetected. 

5. Finally, all the £? bivariate secret sharings gj (u,+) are recombined into a 


single bivariate sharing g’"(a;,-) that embeds the ¢ = £? - 43 secrets. 


1.3 Paper Outline 


The rest of the paper is organized as follows. Section 2 overviews preliminaries 
required for the paper. Section 3 revisits the PSS scheme of [4] for the setting of 
PMPC and introduces a new multi-dealer batched share sub-protocol. Section 4 
presents the ideal functionality and concrete instantiation of our new PMPC 
protocol for dynamic groups with dishonest majorities. Section 5 focuses on each 
subprotocols of the overall PMPC protocol and proves their security; the formal 
security proofs for the PMPC protocol are provided in the full version. 
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2 Preliminaries 


Notation. Throughout the paper, we consider a set of n parties P = {P,,..., Pn}, 
connected by pairwise synchronous secure channels and authenticated broadcast 
channels. P want to securely perform computations over a finite field F = Z, for 
a prime q. 

For integers a,b, we denote [a,b] = {k : a < k < b} and [b] = [1,8].1 We 
denote by P the set of polynomials of degree k exactly over F. When a variable 
v is drawn randomly from a set S, we denote v — S. 


2.1 Adversary Model 


In this section, we briefly recall the proactive security model and the mixed 
adversary setting used in this work. For a more precise exposition, we refer the 
reader to [5, Sect. 2]. The adversary in this model is considered to be a mobile 
adversary that can adaptively decide which parties to (passively or actively) cor- 
rupt between predefined “refresh phases” of the protocol. The computation is 
thus divided into “operation phases”; for example, the circuit representing the 
computation can be expressed as layers followed by “refresh phases” in which a 
refresh protocol is performed to prevent the adversary from learning too much 
information. The adversary can retain all the states of a corrupted party, but 
once a party is uncorrupted the adversary cannot learn future states of such 
a previously corrupted party unless it re-corrupts the party. At any point in 
time, we assume that at any point during the execution protocol, the adver- 
sary controls at most N parties (passively or actively); N is called the corrup- 
tion threshold, and when N > n/2, we are in the dishonest majorities setting. 
Finally, note that in the proactive security model, a party can be uncorrupted 
either because the adversary willingly releases control of said party to compro- 
mise another party while not violating the corruption threshold, or because the 
party was proactively rebooted to a pristine state (hence the term of proactive 
security); henceforth, the adversary loses control over the party. In both cases, 
the uncorrupted party can recover its shares with the help of the other parties 
using a recovery protocol, and can continue participating in the computation. 


2.2 Commitment Scheme 


A commitment scheme [11] is a classical cryptographic primitive. The commit- 
ment to a message m € F, under randomness r € F, is written C(m,r). The 
opening information o(m, r) can be revealed to enable a verifier to check whether 
C(m,r) was indeed a valid commitment to m. A commitment scheme is com- 
putationally hiding if C(m,r) does not reveal information to a computationally 
bounded attacker. It is perfectly binding if a commitment C(m ,r) can never be 
opened with o(mz2,r’) when mi 4 mo. 


1 In particular, if a > b, we have [a,b] = 0. 
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In this paper, we use an additively homomorphic commitment scheme, i.e., 
there is an operation x such that C(m1, r1) * C(mMm2, r2) = C(mi + M2, r1 + r2). 
In particular, we instantiate our protocols with the computationally hiding and 
perfectly binding commitment scheme (g™h”, g") € G? where G is a group of 
prime order p with generator g. This protocol is secure under the hardness of 
the DDH problem. For any element (g1, g2) € G?, there exists a unique value m 
and randomness r such that (g1, 92) = C(m,r). This fact will help us simplify 
some protocols and proofs. 

Finally, we naturally extend the definition to commitments on polynomials 
by providing a vector of commitments for the coefficients of the polynomial. For 
a polynomial f, we denote by C(f, Ry) the commitment to f. 


2.3 Shares and Sharings 


In the following, we will use two kinds of secret sharings: univariate and bivariate. 
In both cases, the term sharing is used to denote a polynomial (either univariate 
or bivariate). The secrets are stored in the evaluations of this sharing on publicly 
known points. In this context, one share will always refer to the information held 
by one participant (the evaluation of the sharing on one point in the univariate 
setting, or a univariate polynomial in the bivariate setting). Hence, a univariate 
share is a point, while a bivariate share is a univariate polynomial. With these 
conventions and the notations of Sect. 2.2, the meaning of a commitment to a 
share or to a sharing is clear. 

More precisely, when talking about univariate sharing we refer to the classical 
Shamir secret sharing. Thus, a sharing f of degree d for the batch of secrets 
$1,---,5¢ between n participants is a univariate polynomial f of degree d that 
satisfies f(8;) = sj for all j € [€] and each party P, share is the evaluation f(a,) 
for a set of public values (),...,(@2,Q1,..-,Qn. In that case, it can be shown 
that the corruption threshold for secrecy on the s1,...,s¢isd+1—-— £. 

For the bivariate sharing, we use the construction introduced in [4]. A bivari- 
ate sharing g of degree d is a bivariate polynomial of degree d in both variables 
with g(G;,6;) = s; for j € [é]. In that case, the share of the participant P, is 
the univariate polynomial g(a;,-). For efficiency reasons in the PSS from [4], 
it is also possible that P, end up with the knowledge of the univariate polyno- 
mial g(-, ar). It was shown in [4] that the corruption threshold is d +1 — v£ for 
secrecy. Usually, we choose the biggest value possible for d. First, it is clear from 
the way the sharings are distributed that d must be smaller or equal to n — 1. In 
the case of proactive secret sharing, we also require d < n — 2 because the PSS 
functionality from [3,4] requires to perform regularly a Recover protocol where 
d+ 1 participants will cooperate to recover the shares of another party. Since, 
there are no other constraint we usually take d = n — 2. In the rest of the article, 
we often use the fact that d ~ n implicitly. When concrete security thresholds 
are given, either we state the formula with d or replace d by the value n — 2. In 
terms of the number of secrets £, the PSS from [4] requires £ < d and we keep 
this restriction in this paper. 
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2.4 Polynomials and Degrees of Freedom 


In this section, we introduce the notion of degree of freedom with respect to a 
set of equations for a polynomial. This definition will prove useful to clarify and 
formalize some statements later. For the rest of this paragraph we fix f to bea 
polynomial of degree d (either univariate or bivariate) over a field k. We define 
an equation on f as an equality of the following form 


X fae (1) 


ceEX 


where X is a finite set of points (X C k if f is univariate and X C k? if f 
is bivariate) and C € k. In the special case where X = {2}, we call this the 
evaluation equation on z. 

A system of equations on f is composed of several such equations as follows: 


Definition 1. A system of equations E on f is a finite set of equations 


B-{ 5100) 


xrEXi 


where Xi Ck and Ci € k for allie ZT C N. When Ji such that x € X; we write 
f(a) € E. 


Since polynomials of given degree d are elements of a finite vector space, it 
makes sense to talk about independent equations (in the classical sense). Hence, 
the dimension of a system of equations is the number of independent equations 
in that system. This is a terminology that we will use throughout this paper. 


Definition 2. Let E be a system of equations as per Definition 1. The degree 
of freedom of f with respect to E is the dimension of E subtracted from the 
dimension of f and is denoted by d;(E). 


In Definition 2, by dimension of f, we mean the dimension of the space in which f 
lives in (the dimension is d+1 for univariate polynomials of degree d and (d+1)? 
for bivariate polynomials of degree d). Another definition of the dimension could 
be the maximum size of an independent system of equations on f; we note that 
the degree of freedom is always a positive integer. 

We now illustrate how this terminology helps formulate some security state- 
ments. Let us consider a univariate sharing f of m secrets and a set of corrupted 
parties {P,,--- , P;} by an adversary A. From the corruption, A learns the share 
f(a,) of all corrupted parties P,. This can be seen as a set of t equations on f. 
Provided, that no other equations is leaked on f, the adversary has gathered a 
system of t independent equations. Thus, the degree of freedom of f with respect 
to the system of A is d+ 1 — t. We have perfect secrecy on the m secrets if m 
is smaller than this degree of freedom. Intuitively, this notion of degree of free- 
dom relates to the number of secrets that can be hidden inside a polynomial. 
It comes especially handy when dealing with bivariate polynomials as we do in 
this article. 
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3 Proactive Secret Sharing 


Constructing MPC from PSS is a natural and well-established approach. In 
this work, we build upon the PSS from [4] to obtain efficient PMPC. Before 
introducing our new generic PMPC protocol (see Sect. 4), we need to adapt 
slightly the scheme from [4]. 

In fact, the issue is not with the protocols from [4] per se, but rather with 
the proofs and security thresholds. In the PMPC framework, operations are per- 
formed component-wise on batch of secrets, creating a sharing of s,xt1,..., Sexte 
from sharing of s1,...,8¢ and t,,...,t¢ (for the desired operation x). In a generic 
arithmetic circuit, there is no guarantee that all the secrets are aligned before 
each layer of computation. That is why it is standard to use a Permute protocol 
to realign the secrets before each round. As a result, the participants will pro- 
duce some sharings where secrets coming from different participants might end 
up in the same batch. This is why we need to ensure secrecy in the setting of 
a batched sharing where some of the secrets are known to the adversary. In [4] 
where the Share is always performed by a single dealer and the batch of secrets 
are not reorganized, this situation never happens. Thus, the proofs from [4] need 
to be updated to show that the protocols retain the desired security in this case. 
We postpone this analysis to the full version of this paper due to lack of space. 

In Protocol 1, we introduce an extension of the Share protocol to the case 
of multiple dealers, allowing several participants to cooperate and generate a 
common secret sharing of their secrets. To add more flexibility, we also make 
possible to add secrets in an existing sharing when the threshold for the max- 
imal number of secrets have not been reached. This extension is quite natural 
given what we said above and allows us to obtain the improvement on the com- 
munication complexity due to batching even in situations where each participant 
has O(1) secrets (which would not be possible in a single-dealer setting since each 
participant has to produce at least one sharing). 

In the protocol below, when £; = 0, we assume that there is no bivariate 
secret sharing g. We build our Share protocol upon the building-block Recover 
which is part of the PSS from [4]. It can be used by d+ 1 participants having 
shares for a bivariate sharing g of degree d to distribute a set of shares for g to 
another participant. 

The security for Protocol 1 is stated in Lemma 1. In all the protocols in this 


work, we highlight the | critical steps using boxes |, as the full protocols includes 


(standard) use of commitments and openings to resist against malicious/mixed 
adversaries. 


Protocol 1. Share protocol 


INPUT: A subset of dealer participants Pp C {P;,...,P,}. A partition 
Up.eppSr for {S4 +1,- --, Sta} where each P, knows the elements of Sp. A 
bivariate secret sharing g with commitment C(g, R,) for the batch of secrets 
{s1, noog Se, }- 

OUTPUT: Distributes a bivariate sharing g’ for the secrets {s1,..., 52, }- 
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1. For each P, € Pp and each sj € Sy, | P, samples gj, Rg; — Pa | such 


that | 9;(@;) = s; | and broadcasts the commitments C (gj, Rg;)- 


2. For al r’ € [|d + 1], and s; € SS, each |P, € Pp sends 
olgj(ar), Rg (ar )) to P» |. The receiver Py broadcasts a bit indicating 


if the opening is correct. For each share for which an irregularity was 
reported, P, broadcasts the opening. If the opening is correct, P, accepts 
the value, otherwise P, is disqualified and added to the set of corrupted 
parties B. The protocols aborts and each party outputs B. 

3. Each P, € {Pi,..-, Pa} | samples qrg, Ean; — Eal with | gr 4(9;) — 0 
for all j € [4 + 1, 42]. 

4. For all r € [n], r’ € [d+ 1], and sj € S, each | P, sends 


O(9r,j(Ar’), Rg; (ar )) to Py | The receiver Py broadcasts a bit indi- 


cating if the opening is correct. For each share for which an irregularity 
was reported, P, broadcasts the opening. If the opening is correct, P, 
accepts the value, otherwise P, is disqualified and added to the set of 
corrupted parties B. The protocols aborts and each party outputs B. 

5. Each P, € {P,,...,Pa4i}| samples gr, Rg, — Pa | such that 


gr(bi) = glar) + Or, duj(ar)| for all j € [A + 1,4] and 
9r(B;) = glar, 8j) | (and the same for Rg, with respect to the poly- 


nomials Ry,,Rq,,,R,) for all j € [4]. P, broadcasts the commitments 
Gr, Rgn- 
Note bat this implicitly defines g' a random bivariate polynomial of 
degree d with g'(a,,-) = gr(-). 

6. For all r € [d+ 1] and j € S+, each party locally compute the commit- 
ments C(gr(6;), Rg, CODE C(g; (ar), Rg; (ar)) and COR Gn Rau. (ar)) 
for all u € [n] before verifying the relation 


C (gr (85), Ro. (85)) = C (g; (ar), Ro; (Ar) xu=1 C(du,j (Ar), Rau, (Or))- 


7. For r’ € [d+ 2,n], {P1,..., P41} U {Pr } perform | Recover on g’ |. 


We write tp (resp. t4) for the number of passively (resp. actively) cor- 
rupted participants. Lemma 1 informally summarizes the security of Protocol 1 
but should not be considered as a formal security statement. In the full ver- 
sion of this article, we use several such preliminary lemmas to formally prove 
Theorem 2. 


Lemma 1 (Informal). Let g be a bivariate sharing for lı secrets s1,..., Sg 
such that the adversary knows €, < 4 of those shared secrets (and no other 
information aside from the prescribed shares) and let s2,41,..., Se, be new secrets 
among which (,—€1 < l2—4 values are known to the adversary. When (+2 < d 
and tp,ta <d+1—Vé, the Share protocol above is correct and preserves secrecy 
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of the (la — £) + (41 — £) secrets unknown to the adversary under the hardness 
of DDH. Additionally, apart from the shares of corrupted participants and the 
secrets already known, the adversary does not learn any other evaluation of the 
sharing g'. 


A proof for Lemma 1 can be obtained with ideas similar to the ones used in [4] 
to prove security of their PSS scheme. 


Communication Complexity: the above protocol requires O(dn(l2 — ¢1)) commu- 
nication. Thus, it yields an amortized communication complexity of O(n?) when 
d=n. 


4 Communication-Efficient Proactive MPC (PMPC) 
for Dynamic Groups with Dishonest Majorities 


We follow the standard blueprint that develops PMPC based on PSS (e.g., [1,5]) 
for arithmetic circuit. An arithmetic circuit can be divided into consecutive layers 
(each consisting of additions or multiplications) such that the outputs of a layer 
are only used once in the next layer.” 

The outline of our PMPC protocol (Protocol 2) is similar to the one of [5]. We 
provide a brief summary below and refer the reader to [5] for more details. Our 
PMPC protocol consists of 8 sub-protocols listed below with a quick summary 
of their purpose. We put the tag (PSS, denoting Proactive Secret Sharing) to 
indicate protocols that are not introduced in this work; for those we use the 
construction from [4]. 


— Share: Takes a batch of secrets and produces a secret sharing. (Protocol 1) 

— Refresh: Rerandomizes a secret sharing. (PSS) 

— Recover: Produces a share of an existing sharing. (PSS) 

— Redistribute: Changes the number of participants for a sharing. (PSS) 

— Reconstruct: Takes shares of a secret sharing and recovers the secrets. (PSS) 

— Add: Performs component-wise additions of two sharings. 

— Mult: Performs component-wise multiplications of two sharings. 

— Permute: Takes a set of secret sharings and applies a permutation on all the 
secrets. 


The main idea is to use secret sharings to keep the inputs private: several Share 
are performed at the beginning to create secret sharings of the inputs and all 
the remaining computations are performed using such secret sharings until the 
last layer of the circuit where Reconstruct is used to compute the outputs. 
Refresh and Recover are the two sub-protocols that make the scheme (proac- 
tively) secure against mobile adversaries. In our adversarial model, we assume 
that the adversary can only change the set of corrupted participants during the 


? Multiple uses can be handled easily by duplicating some sharings according to the 
circuit’s requirement but we avoid them entirely to simplify the explanations. 
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Refresh phase. Therefore, the frequency of Refresh executions can be adjusted 
and provides a tradeoff between security and efficiency. For maximal security it 
can be performed after every other sub-protocol. For simplicity in Protocol 2, we 
refresh at every layer of the arithmetic circuit computation, and more precisely 
we denote R the set of layers \ after which we perform a Refresh operation 
(see Step 3e). Recover is used when parties are “decorrupted” and reset to a 
pristine default state after which they need to obtain shares to participate in the 
computation. The goal of Redistribute is to handle dynamic groups. These five 
protocols constitutes the PSS from [4]. We extend their PSS scheme with Add 
and Mult protocols to evaluate the gates of the arithmetic circuits to be com- 
puted over the secret sharings. Since our PMPC protocol works with batches 
of secrets, we also introduce a Permute protocol that permutes the underlying 
shared secrets to align them correctly to perform Add and Mult. 

To handle dynamic groups, we assume for simplicity that the dynamic 
changes are planned before the execution of the protocol. The Redistribute 
will be performed between consecutive layers of computations. The set £ is the 
set of leaving parties after the execution of layer A. Similarly, My is the set of 
new parties after layer A. Due to the batching, there is also a need to reorder 
the secrets before performing the layer computation (see the full version of the 
paper). S) is the set of secrets after execution of layer \—1 (i.e., after the arrival 
of My and departure of £)) ax is the permutation to be performed on S) before 
the computation of layer A. S; is just the set of inputs and oj is the identity. 


Protocol 2. PMPC for Dynamic Groups with Dishonest Majorities 


INPUT: An arithmetic circuit C of depth do that has inputs z£1,...,£n 
where each x; is a vector of m; values of input for the participant P;. R is 
the set of layers after which a refresh phase is to be performed. 

OUTPUT: n + k values y1,...,Yn+k for some k € N where the total set of 
parties participating in the computation of C is {P,,...,Pn+x}. Each y; is 
either the output values of the circuit Y or a special symbol indicating 
that P; do not receive any output. We denote O the set of parties with 
non- output. 


1. The participants label all the secrets involved in the computation and 
group them in batches of size 4. Then, the participants perform several 
execution of Share to distribute sharings of all the secrets. After this 
point, all the values on the input wires of C that involves participant 
P,,..., P, are shared among all the other parties. In particular, all the 
input wires of the first layer of C are shared. 

2. Run the Refresh protocol. This corresponds to one refresh phase. 

3. For each circuit layer A = 1,..., dc: 

(a) A permutation ø) of all the shared secrets is performed with the 
protocol Permute to align the secrets involved in all the gates of the 
layer A. 
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(b) For each batch of addition or multiplication gates in layer A: Compute 
a sharing of a batch of outputs using Add or Mult. 

(c) The set of leaving participant £, exits the execution of the protocol 
using the Redistribute protocol. Then, the set of new parties My 
is introduced with a new execution of Redistribute. 

(d) All the participants may perform multiple executions of Share so 
that all the inputs of the gates of the layer A + 1 are shared among 
the parties, possibly rewriting over the old secrets that will not be 
reused during the rest of the computation. 

(e) If A € R, run the Refresh protocol. 

4. At the end of the previous step, the parties are supposed to have a sharing 
for all the value in the output Y. We also assume that all the parties P; 
with y; AL are among the set of parties at this time of the protocol. The 
parties in O perform a Permute protocol to regroup the output values in 
a set of bivariate sharings. 

5. The Reconstruct protocol is performed several times so that all the val- 
ues in Y are revealed to all P, € O. 


6. Two special operations may be ran during the execution of the protocol. 
— Upon receiving a message Help! from a party P,, all the parties 
execute several times Recover to provide P,. sharings of all the secret 
values required for the later computations of the protocol. If the 
procedure occurs after the sharing by P, of the value pr, the other 
parties also reveal to P, their share of the sharing so that P, can 
compute the value py for himself. 
— If one of the participant exits the protocol without prior agreement, 
the remaining parties perform Redistribute in the corrupted mode 
to distribute all secrets between them with a proper sharing. 


Remark 1. For easy of exposition, we assumed that all the outputs (represented 
by the set Y) are revealed to all the participants in O. In reality, the protocols 
often require that each participant obtain a different output. We can apply stan- 
dard techniques to modify Protocol 2 in order to handle this functionality. For 
a given sharing containing several outputs that are to be revealed to different 
participants it suffices that each of these participants generate a new sharing of 
zeroes and a random value at the index of the desired output. Then, each of 
these sharings can be added to the initial sharing with Add. Finally, the partic- 
ipants perform Reconstruct on the sharing obtained after all the operations. 
Thus, the participants will learn values sı + p1,...,5¢+ pe where the p; are the 
random values. The participants that is supposed to get the value s; will be able 
pi since he was the one to generate it, but the other participants will not learn 
anything on si. 

In the full version, we prove Theorem 1, which shows that Protocol 2 securely 
realizes the ideal functionality IDEAL**°'7"™"¢ (also defined in the full version). 


The thresholds are computed by taking d = n—2 (which is the maximum possible 
value). 
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Theorem 1. For a circuit C where the minimum number of participants is 
no. When the size batch is € = no — 2, and assuming the hardness of DDH, 
the protocol PMPC introduced in Protocol2 securely realizes the ideal process 
IDEALS min MPC against any adversary bounded at any given time by the multi- 
threshold T(€) = {(n—3- Vin—3- vo} with n the number of participants at 
that time. Additionally, when the adversary is also bounded by the multi-threshold 
T; (2) = {(k, min(n—k—-V£,2(n—k)-£,1< k < n/3}, WP!PS securely realizes 
IDEALA MPC, 


5 Subprotocols for PMPC 


In Sects.5.1 to 5.4, we introduce the sub-protocols used in the Mult protocol 
described at a high level in Sect. 1.2. Next, we introduce the full multiplication 
protocol in Sect. 5.5. We defer the treatment of reordering the secrets to the full 
version as it follows essentially from previous work. 


5.1 Bivariate to Univariate Sharing 


The subprotocol from this section efficiently transforms a (bivariate) secret shar- 
ing of a batch of £ secrets s1,...,s5¢ into several univariate sharings of smaller 
size. For simplicity, we treat the case where £ = vm and construct v univariate 
sharings of m secrets each. This procedure is described in Protocol 3. 

We denote g the bivariate sharing of degree d and (fx) xe[,] the univariate 
sharings of degree d. We write I, = |(k — 1)m+1,km] and each fẹ will be a 
sharing for sj for all j € Ip. We also denote I(x) = [[ueja+1] gat for the 
UuZ~r 


Lagrange polynomial. 


Protocol 3. ReshareBivariateToUnivariate 


INPUT: A set P = {P,,...,P,} holding a bivariate sharing for a batch of 
L = mv secrets sı = g(81, 61), ---, Se = g( be, Be) and the commitment to 
this sharing C(g, Rg). 

OUTPUT: For all k € [v], each party P, holds its share of the univariate shar- 
ing for the batch of secrets {s;}jer, along with a commitment C (frk, Ry, ) 
to that sharing. 


1. For k € [u] : 
(a) For r € [d+ 1], P, samples polynomials | qr, Rg, — Pa | such that 


(ere WHA eR) = ar(8;) ’ Rg (ar, B;) H-(B;) = Rg, (63) ; Vj E Ile 


P, broadcast C (qr, Ra, )- 
(b) For all j € Ip, each participant computes locally C(q,(3;), Ra. (83) 
and C (glar, 6i), Rg(ar, 8;)) and verifies that 


C (qr (Bj), Rg, (8;)) = Hr (Bj) + C(g(ar, Bj), Rolar, Bj) 
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If the verification fails, P, is added to the list of corrupted participant 
B. The protocol aborts and each participant outputs B. 

(c) For r € [d+ 1] and r’ € [n], P, sends | o(q,(ar’), Rg, (@r’)) to Pr» | 
The receiver P, broadcasts a bit indicating if the opening is correct. 
For each share for which an irregularity was reported, P, broadcasts 
the opening. If the opening is correct, P, accepts the value, otherwise 
P, is disqualified and added to the set of corrupted parties B. The 
protocols aborts and each party outputs B. 


(d) Each party P, sets its share as | fk(ar) = yo qu(a@,-) | and locally 
compute the commitment to fp. 


5.2 Blinding Bivariate Mask Generation 


The goal of the BlindingBivariateGeneration protocol is for the participants 
to share and generate a bivariate sharing h of the m values 0,...,0. Hence, it 
will verify h(3;,8;) = 0 for j € I, where I C [4] has size m (typically the J, 
defined for Protocol3). This sharing will be used to blind some values during 
the multiplication protocol and that is why we will sometimes call this sharing a 
blinding mask. As this protocol is designed to be used during the multiplication, 
we need that h verifies some very specific conditions on its values and their 
distribution to the participants. This condition depends on some threshold t 
and is very ad hoc but will allow us to quantify the “amount of randomness” 
we need from the blinding mask. We write Cyuit(t) this condition. For a subset 
T C [n], we write Er the biggest system of independent equations gathered by 
an adversary A corrupting each P, for r € T. The condition Cyuit(t) can be 
expressed as follows: 


Definition 3. A bivariate sharing h of degree d for m secrets is said to satisfy 
the condition Crurt(t) fort <n if for every subset T C [n] of size t: 


- For anyu ¢ T, the degree of freedom of h(ay,-) with respect to the adversary’s 
knowledge is d+ 2-— m. 

- For a system of equation E on h, let us write Xp = {x, 3y such that h(x, y) € 
E}. For any u ¢ T, any value z and any system of equations E such that 
#Xp < d-t anda, ¢ Xz, the evaluation equation of h(a, z) is independent 
of EU Er. 


Protocol 4. BlindingBivariateGeneration 


INPUT: A set of index I C [4] of size m. 
OUTPUT: A bivariate sharing of 0,...,0 at the points (8;)jez is distributed 
to the parties along with the commitment to this sharing. 


1. | For r € |n], P, samples qr, Ry, — Pa such that q,(8;) = 0 for j € I 


and broadcast the commitment C (qr, Ry,.). 
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2. For j € I, each P, broadcasts the openings 0(0, Ry, (3;)). If one of the 
opening is not correct, the corresponding participant is added to the list 
of corrupted participant B. 

3. For r € [n], r’ € [n], | P, sends to P, | the opening | o(qr(ar ), Rg,.(ar)) |. 
If the opening is not correct, the corresponding participant is added to 
the list of corrupted participant B. 

4. Hor? € [|n], P» computes qlar) = SS e-(o.), Rola) = 
Yr Ra. (ar), and locally compute the commitment C(q, Ry). 

5. |For r € [d+ 1], P, samples h(a,.,-), Rp(a,,+) — Pa such that 


h(a,, bi) = q(ar), Rn(a,r, 8;) = Rq(a,) and broadcast the commitment 

C(h(a,,+), Rn(@r,:))- 

(Note that this implicitly defines random bivariate polynomials h, Rp, of 

degree d.) 

6. For j € I and r € |d + 1], parties locally compute C(h(a,, §;), 
Ry(ar,3))), C(q(ar), Ra(ar)) and check if C(A(ar, 8;), Rn (ars B;)) = 
C(q(ar), Rq(ar)). If one commitment does not satisfy the equation, the 
corresponding participant is added to the list of corrupted participant B. 

7. For r’ € [d+ 2,n], {Pi,..., Pasi} U {Pr} perform | Recover on A|. 


5.3 Bivariate Product 


The BivariateProduct protocol aims at creating a bivariate sharing g* for the 
multiplication of the secrets contained in two univariate sharings f, f’ under the 
blinding bivariate mask h. To distribute g*(x, y) = f(x) f’(y) + h(a, y), each pair 
of participants is going to interact. The core of this exchange is a zero-knowledge 
multiplication protocol, denoted ZK-Mult, described in the full version and based 
on [9, Sect. 6.2]. When composed with the key generation KeyGen of the Paillier 
encryption, this protocol ZK-mult securely realizes the Fproa functionality (see 
Fig. 1). The protocol is performed by two participants P}, P> where the input 
for P; is a key pair for the Paillier encryption scheme and a value x while Pz 
has input the corresponding public key and two values y, 6. At the end of the 
protocol, Pı learns x-y+06. This subprotocol will be repeated to compute product 
of polynomials. 


Fproa interacts with two parties Pı and P2. 
Upon reception of x from P, and y, ó from Pz, Fproa sends xy + ô to P; and 
the special symbol L to Pp. 


Fig. 1. Functionality Foro 
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Protocol 5. BivariateProduct 


INPUT: A set I of indices of size m and a set P = {P,,..., Pn} of partici- 
pants holding two univariate sharings f, f’ to batches (s;)ier, (s;)icr of m 
secrets and one bivariate sharing h to m zeroes and the commitments to 
these sharings C(f, R), C(f’, Ry’), C(h, Ra). 

OUTPUT: The bivariate sharing of g*(x,y) = f(x)f’(y) + Aly, z) and the 
commitment C(g*, Rg) (with Rọ (x,y) = f(a) Ry (y) + Ray, 2)). 


1. For all r € [n] and r’ € [d+ 1], 

(a) P, samples (pkr, skr), (pkr R, Skr,r) — KeyGen, produces two NIZK 
proofs of key generation and broadcasts all the public keys and cor- 
responding proofs. 

(b) Pr verifies the proofs and if one of the verification fails, P, is added 
to the set of corrupted participants. 


(c) | P, and P, perform ZK-Mult | on inputs pkr, sk,, f(a,) for P, and 
pkr, f'ar), hlar, ar) for P,.. | P, obtains output g* (ar, &r ) |. 
(d) | P, and P, perform ZK-Mult | on inputs pkr R, Skr,R,f(@r) for 


P, and pk, r, Rf (ar), Rr(ar,ar) for Pp. |P, obtains output 


Ror (ar, Ar’) | 
(e) P, computes locally the commitments C(f' (ar), Ry (ar)) and 
C(h(ar, ar), Rn(Qpr, ar)) and checks that 


C(g* (ar, ar), Rg» (ar, Op?) = 


(fla) j C(f' lar, Ry (aw))) * C(R( apr, ar), Ra (Qr’, ar)) 
If the verification fails, P, is added to the list of corrupted partici- 
pants B. 
2. | P, broadcast] the commitment |C(g*(a,,-),Rg+(ar,:))| for all 
ré[d+1J. 


5.4 Random Evaluation for Commitment Verification 


The goal of Protocol6 is to generate a random value rand used to ensure cor- 
rectness of the shared commitments in Protocol 5 by revealing f(rand) in clear. 


Protocol 6. CommitmentVerification 


INPUT: Two univariate sharings f, f’ and one bivariate sharings h dis- 
tributed among a set P = {P\,...,P,}. Another bivariate sharing 
g* (x,y) = f(x) f'(y) + hly, x) distributed among the participant along with 
the commitment C(g, Rg) for some bivariate polynomials g, Rg 

OUTPUT: A bit b € {0,1}. 
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1. For r € [n], | P, samples vr, Re, — F | and broadcast C(v,, Ror). 


2. For r € [|n], | P, broadcasts o(v,, Ry,.) |. If the verification fails, P, is 
added to the list of corrupted participant B. 


n 
u=1 


Then, every party computes | rand = )> Uy |. If rand is any values of 


the protocol (a, or 8j), go back to step 1. 


3. For r € [n], | P, samples ARA — Pa| such that | fr(rand) = 0| and 
broadcasts C(f,, R Fo 
4. For r € [n], | P, sends to all P, | the opening to o(fr(ar), Ri (Or) l 


The receiver P, broadcasts a bit indicating if the opening is correct. 
For each share for which an irregularity was reported,P, broadcasts the 
opening. If the opening is correct, P, accepts the value, otherwise P, is 
disqualified and added to the set of corrupted parties B. The protocols 
aborts and each party outputs B. 

5. Each P, broadcasts the openings o(0, R; (rand)). If one of the opening is 
not correct, the corresponding participant is added to the list of corrupted 
participant B. H 

6. Set f = f + Ðu fu and Ry = Ry +X- Ry. For r € fn], 


P, broadcasts the opening o(f(&r), R;(ar)) ! 


7. For allr € [n], each party computes locally C (f(a), R f(Q,)) and verifies 
that the opening is correct. If the verification fails, P, is added to the 
list of corrupted participant B. 


8. Each party computes the values | f (rand) by interpolation |. 


9. For all r’ € [d+ 1], each participants computes locally the commitments 
C(f! (arr , Rp (ar), C(h(ar, Che) Ri (ar, ar)) and 
C(g(xrand, ar), Rg(rand, a,-)) and checks that 


C(g(xand, Qr), Rg (rand, ar )) z= 


(/(rana) ; C(f" (a, Rylan) x C(A(ay, ar), Ralar, O,)) 


If one of the verifications fails the participants broadcasts a 0, otherwise 
it broadcasts a 1. 

10. If one 0 was broadcasted in the previous step, the protocols outputs 
b = 0, otherwise it outputs b = 1. 


5.5 The Multiplication Protocol 


We are now ready to introduce the whole protocol performing the multiplication 
of two bivariate sharings g, g’ containing £ secrets each. The output g” is a bivari- 
ate sharing for the @ pair-wise products of secrets in g, g’. We have also a value 
m which must respect the bound m < d — tp to obtain security. For simplicity 
we assume that £ = mv. The generalization for any £ is straightforward. 
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Protocol 7. Mult 


INPUT: A set P = {P,,...,P,} holding two bivariate sharings g, g’ for two 
batches of l = mv secrets s1,...,5¢ and s{,...,s, and the commitment to 
these sharings C(g, Rg), C(g’, Rg’). 

OUTPUT: Each party holds its shares for the bivariate sharing g” for the 
batch of £ secrets s1s{,...,s¢s) along with the commitment C(g”, Ry) 


1. The participants perform | ReshareBivariateToUnivariate on g| and 


~ 


g' | We write (fx) xefry and (f,) refx] the outputs of the two executions. 
2. For k € [vu], 
1. Set Ip = [1 + (k — 1)m,km] and execute | BlindingBivariate 


Generation | on input I. We write hg the bivariate sharing obtained 
in output. 

2. Participants perform | BivariateProduct | on the sharings fk, fp, hx 
and set J to obtain g% a bivariate sharing with a bivariate commit- 
ment C (gk, Rg, )- 

3. Participants execute CommitmentVerfication on shar- 
ings fk, fp, hk, gf. If the output is 0, the protocol aborts and each 
party outputs B the set of corrupted participants. 


4. P, | interpolates gf(a,, G;) | for j € Ip. 


3. For r € |d + 1], | P, samples g’(a,,-), Rg (@r,:) — Pa | with 


g” (ar, Bj) = gf (Ar, Bj) | and Rgr(ar, Bj) = Ros (ar, Bj) for all k € [e] 

and j € J, and broadcast the commitment C(g”, Rg). 

(Note that this implicitly defines a bivariate polynomial g” of degree d.) 

4. For all r € [d+ 1], each participant P, computes C(gx(ar, bj), 
Rg, (ar, B;)) and C(g! (ar, Bi), Rg (ar, B;)) and checks equality for all 
k € [v] and j € Ip. If one of the verification fails for index r, P, is added 
to the set of corrupted participants B. The protocol aborts and each 
participant outputs B. 


5. For r’ € [d+ 2,n], {Pi,..., Pasi}U{P,-} perform | Recover on g” | 


Communication Complexity: The complexity of ReshareBivariateToUni 
variate is O(n?v) and O(n?) for BlindingBivariateGeneration, Bivariate 
Product and CommitmentVerification. Thus, the overall complexity of the 
two first steps is O(n?v). The final Recover step is performed in O(n?) when 
n — d = O(1). Overall, this is O(n?v) and O(n?/m) amortized. When £ = n — 2 
and m = y£, we obtain the claimed amortized complexity of O(n/n). 


Theorem 2. When £ < d and m = [Vf] and assuming the hardness of DDH, 
the protocol IIMU"T introduced in Protocol? securely realizes the ideal process 
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bea ee against any adversary bounded by the multi-threshold T(¢) 


where we have T(£) = {(d— 1 Vi,d—1 VO}. 


Z,S,FMULT,¢ 
mixed 


The ideal functionality IDEAL 
be found in the full version. 


and the proof of Theorem 2 can 
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Abstract. Private Set Intersection (PSI) enables two parties, each hold- 
ing a private set to securely compute their intersection without reveal- 
ing other information. This paper considers settings of secure statistical 
computations over PSI, where both parties hold sets containing identi- 
fiers with one of the parties having an additional positive integer value 
associated with each of the identifiers in her set. The main objective 
is to securely compute some desired statistics of the associated val- 
ues for which its corresponding identifiers occur in the intersection of 
the two sets. This is achieved without revealing the identifiers of the 
set intersection. In this paper, we present protocols which enable the 
secure computations of statistical functions over PSI, which we collec- 
tively termed PSI-Stats. Implementations of our constructions are also 
carried out based on simulated datasets as well as on actual datasets in 
the business use cases that we defined, in order to demonstrate practical- 
ity of our solution. PSI-Stats incurs 5x less monetary cost compared to 
the current state-of-the-art circuit-based PSI approach due to Pinkas et 
al. (EUROCRYPT’19). Our solution is more tailored towards business 
applications where monetary cost is the primary consideration. 


Keywords: Private set intersection - Homomorphic encryption - 
Statistical functions 


1 Introduction 


Private set intersection (PSI) enables two parties to learn the intersection of their 
sets without exposing other elements (identifiers or items) that are not within 
this intersection. This has wide-ranging applications in data sharing, private 
contact discovery, private proximity testing [29], privacy-preserving ride-sharing 
[20], botnet detection [28] and human genomes testing [5]. We highlight a number 
of notable work that have been achieved in this domain in Sect. 6. 

The main problem statement of our work can be simply described as follows. 
Sender A and receiver B hold sets of identifiers with receiver B additionally 
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holds positive integer values associated with each of the identifiers. Denote the 
sets held by A, B to be X and Y respectively. The objective is for B to learn 
the desired statistical output function of some collection (dependent on X) of 
the associated values, while preserving certain private information about their 
respective sets. More formally, B seeks to learn the value Fp(X,Y), where D 
is the decisional rule and F is the desired statistical function computed over D. 
To preserve privacy, A does not learn Y and D(X,Y) while B does not learn X, 
D(X,Y) and |D(X, Y)|. In our context, D is the private set intersection (PSI) of 
the identifiers contained in X and Y. These settings arise in numerous business 
and practical applications. 


1.1 Our Contributions 


We present PSI-Stats to address this main problem statement. PSI-Stats is a 
collection of protocols to support the secure computations of statistical func- 
tions over PSI. These include a myriad of frequently applied standard statistical 
functions such as various generalized means, standard deviation, variance, etc. 
The proposed protocols achieve the privacy requirements outlined in the problem 
statement. The main contributions are summarized here. 


— PSI-Stats can be enabled to securely compute multiple related statistical func- 
tions within a single executed protocol with minimal additional communica- 
tion and computational overhead, while maintaining the privacy guarantees 
as defined in the main problem statement. Our techniques are also applicable 
to non-symmetric functions such as weighted arithmetic mean. 

— It is undesirable in many instances for receiver B to know both the intersection 
cardinality and the output functionality as the combination of these can reveal 
some information about the intersection set. To address this issue, one key 
contribution of our work is to restrict any such inference information to the 
absolute possible bare minimum. This is achieved by hiding the intersection 
cardinality from receiver B and thus only the desired output functionality 
(and nothing more) is revealed to him. 

— We carried out extensive experiments of our protocols to determine their 
practicality and feasibility. Our test input sizes range from small to large. 
The experimental results demonstrate that PSJ-Stats is practical and scales 
well for large input sizes. We also conducted experimental comparisons of our 
protocols with the current state-of-the-art circuit-based PSI protocol due to 
Pinkas et al. [34]. Our protocols incur 5x less monetary cost and 5.2x less 
communication overhead. 


In an interactive protocol, there are three factors in the overall measurement 
of efficiency: the first relates to the communication overhead, the second relates 
to the computational cost and the third relates to the number of communica- 
tion rounds (or round complexity). The work in this paper does not claim to 
outperform circuit-based PSI protocols across all the three factors above. As 
an example, the current state-of-the-art for circuit-based PSI protocols is the 
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very recent work of Pinkas et al. [34] which we reckon to potentially attain the 
lowest computational cost (after the necessary circuit modifications in order to 
accommodate outputs of statistical functions). 

A goal of our work aims to present protocols with minimal communication 
overhead based upon well-established, time-tested hardness assumptions while 
concurrently ensuring that running times remain practical. To that end, the PSI- 
Stats protocols in this paper incur the lowest communication overhead over all 
circuit-based types (inclusive of the most recent state-of-the-art [34]) by several 
factors. In that regard, PSI-Stats is especially relevant in settings where com- 
munication cost comes at a premium or instances where bandwidth is limited. 

Circuit-based PSI approaches can generally be instantiated by either Yao’s 
garbled circuit protocol [41] or the GMW protocol [19]. While Yao’s protocol pro- 
vides a constant round complexity, the GMW protocol is typically the overall 
preferred option as it has several advantages over the former. A comprehen- 
sive comparison between Yao’s protocol and the GMW protocol can be found 
in [39]. However for circuit-based PSI under the GMW family, the round com- 
plexity is dependent on the circuit depth which increases with increasing set 
sizes and/or increasing bit-length of items. This can potentially be a bottleneck 
in high latency networks. By contrast, the PSJ-Stats protocols operate with a 
low constant round complexity of 3, independent of the input set sizes and the 
bit-length of items. 

To the best of our knowledge, alternative approaches to solve the main prob- 
lem statement beyond circuit-based methods are either less efficient in our con- 
text or employ the usage of computationally intensive homomorphic encryption 
schemes, such as [7,9]. We resolve the problem without resorting to the machin- 
ery of such expensive approaches. It should be noted that our protocols reveal 
the intersection size to sender A. However, this does not enable sender A to 
apply any inference attack based on the associated values held by receiver B 
as they all safeguarded by homomorphic encryptions in our protocols. On the 
other hand, such attacks are relevant if this intersection cardinality is revealed 
to receiver B as discussed. Hence, it is crucial that this information is hidden 
from receiver B which our protocols attain. 


2 Preliminaries 


The security model in this paper operates in the semi-honest setting. In this 
model, adversaries can attempt to obtain information from the execution of 
the protocol but they are unable to perform any deviations from the intended 
protocol steps. The semi-honest model is typically suited in scenarios where 
execution of the software is ensured through software attestation or business 
restrictions, without any assumption that an external untrusted party is unable 
to obtain the transcript of the protocol upon completion. Indeed, the majority 
of the research in related domains also focus on solutions in the semi-honest 
model. Hereinafter, we shall simply refer to mean as being arithmetic mean 
while references to other generalized means will be stated explicitly. 
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In this paper, we say an integer z is l-bit (length) if x € ZN [2'"1, 2! — 1] and x 
is at most l-bit (length) if x € ZN (0, 2! — 1]. The standard ceiling function is given 
by [.], where [x] represents the smallest integer greater than or equal to x. The 
nearest integer function is denoted by |.], log refers to the natural logarithm, and 
e is the standard mathematical constant (i.e. the base of the natural logarithm). 
The participants’ setting and notations in the description of our protocols in this 
paper is identical. We provide it here to serve as a convenient common reference. 


Notations 

a;i, bi: identifiers. 

A holds X = {a1,a2,..., am}. 

B holds Y = {(b1, t1), (ba, ta), ean (bn, tn)}, t € Z+. 

Y’ = {b1,b2,..., bn}. 

E(.): Paillier encryption of a 3072-bit modulus. 

h(.): SHA-256 hash function. 

G: a multiplicative group of integers of large prime order. 


3 Private Set Intersection-Mean 


This section describes a protocol to correctly output only the intersection mean 
(i.e. without disclosing intersection-sum nor intersection cardinality to B). There 
are numerous flexible applications for the intersection mean functionality apart 
from the secure computation over numerical values. For instance, records in a 
dataset can be encoded as 0 or 1 to represent entries of a binary attribute such 
as “gender”. The arithmetic mean of these encoded values can thus directly pro- 
vide the percentages of “females” and “males” which belong in the intersection 
of the two datasets. We provide concrete recommendations for the appropriate 
sizes of the various parameter values for use in our protocols which we also show 
to provide strong security guarantees satisfying statistical indistinguishability. 


PROTOCOL 1 (Private Set Intersection-Mean) 


Input: A inputs set X; B inputs set Y. 
Output: A outputs |X N Y’|, B outputs intersection mean. 


1. Setup: A and B jointly agree on FE, a hash function h and a group G of 
large prime order. B generates a public-private key pair of E, announces the 
public key and keeps the private key to herself. 

2. A’s encryption phase: A 

(a) selects a random private exponent kı € G; 

(b) computes h(a;)". 

A sends h(a;)*! to B. 

3. B’s encryption phase: B 

(a) selects a random private exponent k2 € G; 

(b) computes h(a;)*1*2; 
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(c) computes {h(b;)*?, E(t;)}. 

B returns h(a;)*:*2 in shuffled order to A. B sends {(h(b;)*?, E(t;)} to A. 
4. Matching & homomorphic computations: A 

(a) computes {h(b;)***, Blt) } 

(b) computes the set J of intersection indices where 


I = {j : h(a;)"*? = h(bj)"?™ for some i}; 


(c) samples a uniformly random 1024-bit value of r. 
(d) selects uniformly random integer values of r1, r2, where 0 < rı < 218 — 1, 
2511 < r < 2512 — 1 with rı satisfying 


rı =rmodk 


where k = |I|. 
(e) additive homomorphically computes 


T= Tfi 
e(n4 k ya): 


icl 


A sends r and E | r2 + “7 Da) to B. 

ier 
5. B’s decryption phase: B performs decryptions of the ciphertext received 
from A and computes (division over real numbers) 


Theorem 1. Protocol 1 correctly outputs the intersection mean (and which can 
also be made arbitrarily close to the exact value). 


3.1 On the Chosen Sizes of r, r1, r2 


The larger the value of r, the closer the approximation of output M’ is to the 
exact mean value M. Moreover, this approximation can be made arbitrary close 
for arbitrary large values of r (along with corresponding large parameter sizes 
of E). In practice, a sufficiently large value of r already provides a very tight 
approximation. The size choices of r1, r2 are 128 bits and 511 bits respectively 
to prevent exhaustive search attacks. After the sizes of r1, rs are set, the size of r 
can be chosen to sufficiently overwhelm r1, r2. In our case, we set r to be of size 
1024-bit which is sufficient for M’ to M to be extremely close. In particular, 


IM — M'| < 2752M (1) 


which suffices for all practical intent. 
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3.2 Flexibility of Protocol 


It should be noted that the exact mean value in fact reveals additional infor- 
mation about the sum or cardinality. For instance, if the mean is an integer 
M then it follows that the sum is divisible by M. More generally, if the mean 
is a rational number $ with gcd(a,b) = 1, then the sum is divisible by a and 
the cardinality is divisible by b. As discussed in the preliminary section, we 
do not consider such implicit information which can be deduced from the out- 
put functionality. Nevertheless, Protocol 1 has the added benefit of flexibility 
which enables the adjustment of varying degrees of approximation tightness if 
one wishes to circumvent the above issues. This can be achieved by adjusting 
the size of a randomly sampled r. The approximation weakens with decreasing 
sizes of r. More generally, for a random sample r of x-bit, x > 515, the difference 
yields 

|M — M'| < 2°?" mM, (2) 


3.3 Security Analysis 


The security arising from the communication in Step 2 and Step 3 follows from 
the validity of the Decisional Diffie-Hellman assumption as well as the hardness 
of the Decisional Composite Redisuosity Problem. Hence, the remaining secu- 
rity and privacy aspects to consider occur in Step 4 where B receives r and 


E | r2 er Sot . Since r is sampled uniformly at random from a collec- 
icI 
tion of 1024-bit integers, B is unable to distinguish r from a random uniformly 
selected 1024-bit integer. As before, denote M to be the exact value of the inter- 
section mean. Thus, B obtains rg + == Yt = r2 + (r — rı)M. Moreover, 
ie 

only r and M! are known to B and thus can only effectively compute rz — 71M 
which we show in the following is statistically indistinguishable from a uniformly 
sampled 512-bit integer. This serves the purpose of hiding the cardinality and 
intersection sum. We begin with a standard security definition. 


Definition 1. Let X and Y be two distributions over {0,1}". The statistical 
distance of X and Y, denoted by A(X,Y) is defined to be 


A(X, Y) = max |Pr|X €U]—PrlY € U]| 


UC{0,1}” 
1 
=5 5 |Pr|X =v] — Pr[Y =]. (3) 
ve Supp(X)USupp(Y) 


X is € statistically indistinguishable from Y if A(X,Y) < €. 


1 Technically only M’, which is a close approximation to M can be computed by B. 
In essence, this distinction is largely irrelevant in this specific context as we evaluate 
a stronger security setting than required where B has the knowledge of both M and 
M'. 
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In particular, we show that rə — rıM is statistically indistinguishable from 
uniformly distributed 512-bit integers, when M is a fixed positive integer. It can 
be assumed here that the intersection mean M is at most 80-bit in all practical 
settings. Denote random variables Rı = unif[0,c] and R2 = unif{a, b] to be the 
discrete uniform distributions over ZN [0,c] and ZN [a,b] respectively such that 
cM <a< b. We first establish the following. 


Theorem 2. 


cM 
A(Rə— R R2) = ———__.. 4 
(R2 iM, Rə) Geerli] (4) 
In Protocol 1, a = 2511, b = 2512 — 1, c = 2128 — 1 and M is assumed to be 
at most 80-bit. Hence in Protocol 1, 


A(R — RiM, Ro) < 273%. (5) 


This shows that rə — rı M is 27300 statistically indistinguishable from uniformly 
distributed 512-bit integers when M is a positive integer. Here, we simply apply 
80-bit as a concrete upper bound of M in all practical use cases. It can in fact be 
way larger than 80-bit subject to the corresponding constraints of (5). In cases 
where M = ¢ is a non-integer positive rational number such that a < r2, a 
similar argument can be applied to show that rəb — ra is statistically indistin- 
guishable from uniformly distributed 512-bit integers which are multiplied by a 
factor of b. 


3.4 Geometric Mean 


Our method can be adapted to output the intersection geometric mean func- 
tionality without revealing the intersection size to B. One application arises in 
the computation of the Atkinson index [4] of income inequality which is a func- 
tion of both the geometric mean and arithmetic mean. Another such instance 
of the geometric mean can arise in Econometrics as specified by the generalized 
Cobb-Douglas production function [14], where its inputs can represent working 
hours of labourers and each exponent is the reciprocal of the number of inputs. 
k k 

The geometric mean of a data set {t1,t2,...,t,} is given by (11 s) .A 
natural line of approach is to replace additive homomorphic e e with a 
multiplicative homomorphic encryption (e.g. RSA [38]) in view of the multiplica- 
tive structure of the output. This works perfectly fine if the intersection size is 
exposed to B. However, in the security model where the intersection size is kept 
secret from B, there is no known efficient public-key based protocol utilizing 
multiplicative homomorphic encryption which can achieve this with reasonable 
accuracy. We present a method to attain this desired functionality using ideas 
based on our earlier approach for arithmetic mean. 

Without being overly verbose, we do not detail the fully fledged protocol 
but instead highlight the crucial steps. Suppose t; € [1,t + 1]. We first seek 
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an injective function fe : ti — |clogt;] for some positive integer constant c. 
Here |.] denotes the nearest integer function in accordance with the most recent 
IEEE Standard for floating-point arithmetic [1]. The value of c is chosen by 
B, the party who holds the list of associated values. We show that fe can be 
constructed by taking c>t+1. 


Theorem 3. fe is injective Vc>t+1. 


The injectivity condition is stipulated to ensure that no two distinct values of 
t;’s correspond to the same image under fe. The protocol for the intersection 
geometric mean proceeds by replacing t; with |clogt;] given in Protocol 1. One 
other difference lies in the final decryption step performed by B. More specifically 
in the final step, B obtains the intersection geometric mean by computing 


4 [es 5 |clog a oe 5 [clog ti] + 5 log t; ( 
re = 


e iel icl xe ial 


RE 
ra 
II s) l 
(6) 
In general, larger values of c provide a greater precision. In practice, such large 
values of c can always be chosen since the modulus of the Paillier encryption is 
much larger than max{log t;}. The proofs of correctness and security mirror that 
of the intersection mean. 


3.5 Extensions to Variance and Standard Deviation 


Our techniques presented in this section can be further extended to compute var- 
ious other statistical functions of the associated values in the intersection. The 
general overall idea is for B to receive the values of nth-order moment about 
the origin in order to compute the nth central moment (without knowledge of 
the intersection size). Let R be a random variable. In our context, R can be 
considered to be the discrete uniform distribution over the associated values in 
the intersection. The nth-order moment about the origin u’, is defined to be 
Uh = E[R"]. The nth central moment un is defined to be un = E[(R — E[R])”]. 
For example, the mean in this case is yu, = E[R]. Variance and standard devi- 
ation provide a measure of the amount of dispersion of a list of values. A low 
standard deviation indicates that the values tend to be clustered around its 
mean, while a high standard deviation indicates that the values are dispersed 
over a wider range of values. In step 3 of the protocol, B sends both E(t;) and 
E(t?) to A. This enables A to return values of E[R] and E[R?] to B. The vari- 
ance can then be simply computed by Var(R) = E[R?] — (E[R])? and standard 
deviation ø = \/Var(R). At the end of this protocol, B outputs the mean and 
standard deviation (or variance). At the same time, B’s knowledge of E[R?] dur- 
ing this process does not reveal any additional information since that quantity 
can be derived from any generic protocol which outputs the mean and standard 
deviation (or variance). The protocols for the skewness and kurtosis output func- 
tionalities can be similarly constructed. 
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4 Intersection-Sum with Approximate Composition 


Trivially, the intersection sum output S reveals that there are less than / ele- 
ments (or identifiers) in the intersection with associated values greater than 2 
In many scenarios, B wishes to know more about the composition of the sum. 
In particular, given the intersection sum, B wishes to have an estimate of the 
number of elements in the intersection with associated value of at most Ê (i.e. 
its approximate sum composition). This information cannot be captured merely 
by the knowledge of intersection sum. To that end, we present two protocols, 
labelled as type 1 and 2 which enable the output of intersection sum along with 
its approximate sum composition. 


T: 


PROTOCOL 2 (Sum Composition type 1) 


Input: A inputs set X; B inputs set Y. 


1 
Output: A outputs |XNY’|, B outputs yo ti and an upper bound for 5 re 
iel cer? 
where J is the set of indices of b; € XN Y”. 


1. Setup: Identical to Protocol 1. 

2. A’s encryption phase: Identical to Protocol 1. 

3. B’s encryption phase: B 

(a) selects a random private exponent kz € G; 

(b) computes h(a;)**?; 

(c) computes {h(b;)*2, E(t;) }. 

B returns h(a;)***2 in shuffled order to A. B sends {h(bo(j))*?, Elto) 
to A, where ø is a permutation of j such that t,(;) > te(j41). B sends M to 


A where 
M> fa \ 
> max : 
to(j+1) 


4. Matching & homomorphic computations: A 
(a) computes {(h(b,;)*2*1, E(t;)}; 
(b) computes the set J of intersection indices where 


I= {j : h(a;)"*? = h(bj)*2™ for some i}; 


(c) additive homomorphically computes Æ (= | 

i€l 
(d) samples uniformly random r, r1, r2 from a sufficiently large set of positive 
integers such that r > r1,r2 and $ > k, where k = |Z]; 


(e) computes rı + rk(M* — 1) and additive homomorphically computes 


k 
E ( +r(M-1)S° ur) 
i=1 
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where {ti }*_, is a permutation of {t;}ier s-t. thy < th. 


k 
A sends E (= s); E ( +r(M — pew) and rı + rk(M* — 1) 
ic] i=1 
to B. 


5. B’s decryption phase: B performs decryptions of the ciphertexts received 


1 
from A to obtain 5 ti and an upper bound for 5 = given by 
ier ier * 


1 2 rı +rk(M®-— 1) 
2 ti k . 
a r2+7(M —1) 5° M, 


i=l 


1 
Theorem 4. Protocol 2 outputs Yt and an upper bound for 5 r 
i€l ier * 
4.1 Applicability of Sum Composition 
Let the sum composition measure T be denoted to be 


T rı + rk(M* —1) 


rot r(M—-1)5° Mtt, 
i=1 


Theorem 5. There are less than | elements in the intersection with associated 
integer values of at most t where t € Z* satisfying t < A. 


For instance when l = 1, it can be established that there are no small asso- 
ciated integer values under A contained in the set intersection. In other words, 
every element in the intersection has an associated value of at least =. It should 
be noted that the applicability of this measure of sum composition is dependent 
on the distribution of the associated values held by B. Generally, this output 
functionality is more useful when the spread of associated values is sufficiently 
large. In practice, B who holds the associated values can decide whether to ini- 
tiate these two protocols based on his dataset. Type 1 enables the transmission 
of an approximate sum composition without any substantial increase in commu- 
nication over the intersection-sum. However, that requires B to reveal an upper 
bound of M to A. We describe a type 2 protocol where communication cost is 
not an overriding consideration without disclosing an upper bound of M to A. 
Type 2 also results in a tighter output approximation compared to type 1. A 
key ingredient involves an injective mapping of the set of reciprocals of positive 
integers to the set of positive integers which preserves addition. Denote fe to be 


such an injective map such that fe: ti —> [el for a suitable large constant c. 
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5 


PROTOCOL 3 (Sum Composition type 2) 


Input: A inputs set X; B inputs set Y. 
1 
Output: A outputs |XNY"|, B outputs Da ti and an upper bound for 5 T 


ie! ier * 
where I is the set of indices of b; € XNY’. 


1. Setup: Identical to Protocol 1. 

2. A’s encryption phase: Identical to Protocol 1. 
3. B’s encryption phase: B 

(a) selects a random private exponent kz € G; 

(b) computes h(a;)*1*?; 

(c) computes {h(b;)*?, E(t;), E(folty)) }- 

B returns h(a;)***? in shuffled order to A. B sends {h(b;)*?, E(t;), E(fe(t;))} 
to A. 

4. Matching & homomorphic computations: A 
(a) computes {h(b;)*2*, E(t;), E(felt;))} 

(b) computes the set J of intersection indices where 


I = {j: h(a;)"*? = h(b;)"2" for some i}; 


(c) additive homomorphically computes 


E > s) and E (= ras) ; 


tel icl 


A sends E (= s) and E (= ra) to B. 


ier ier 
5. B’s decryption phase: B performs decryptions of the ciphertexts received 
1 
from A to obtain 2 ti and an upper bound for 3 F given by 
av a 


1 
Theorem 6. Protocol 3 outputs Seti and an upper bound for 5 a 


icI icI * 


4.2 On the Selection of c 


Suppose the values of t;’s are bounded by x bits. We show that taking c to be 
of size 2x + 1 bits admits fe to be injective. Indeed, let t;, t; be two distinct 
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associated values. Without loss of generality, assume t; < tj, Then 


ty — ti 
ee Pee ees (8) 
ti tj titj 
since c is of size 2x + 1 bits. It follows that 
c c c c 
a Sd ae = 9 
ti ty] A 7 | 9 


which proves injectivity. 


4.3 Comparisons Between Type 1 and Type 2 


Table 1. Recommended RSA key length from NIST 


security level (k); 80 112 128 192 256 
RSA key length 1024 2048 3072 7680 15360 


The NIST report [6] details the recommended RSA key length (in bits) to achieve 
a k-bit security as given in Table 1. In the case of E being the Paillier encryption, 
a 2048-bit length modulus corresponds to a 112-bit security level. This translates 
to a ciphertext of length 4096-bit for Paillier encryption. Thus, this increase in 
communication overhead is approximately in the region of 4096n bits. On the 
other hand for large intersection sizes, type 2 is more practical and has a lower 
computational cost. 


5 Implementation and Performance 


We implemented PSI-Stats in C++. The benchmark machine is desktop work- 
station running on a single-thread with an Intel Core i7-7700 CPU @ 3.60 GHz 
and 28 GB RAM. The bandwidth is 4867 Mbps with round-trip time of 0.02 ms. 
The respective input sizes m, n are equal and the comparisons are based on the 
running time (in seconds) as well as communication cost (in MB). Al experi- 
ments apart from the UCI and Kaggle datasets are run at full intersection sizes 
(i.e. intersection size = m = n). We set the value of c = 10° for the geomet- 
ric mean protocol. In running Protocol 1, we also provide the readers with the 
results of the actual generalized means alongside the outputs that it obtained 
from the execution of the protocol. The output generalized means values given 
in the tables are all rounded down to the nearest whole number. 

We apply similar parameters as with the experiments conducted in [22]. The 
elements/identifiers in the generated datasets are 128-bit strings, with associated 
values being at most 32 bits long (the specific testing range is set between 1 to 
100). The sum of the associated values is also bounded by 32 bits. The input set 
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Table 2. Performance of Private Intersection-arithmetic mean protocol. 


Time(s) Comm.[MB] | Actual Value | Output (rounded down) 
Input Size | Offline Online | Total 

1000 1.46 0.49 | 1.95 0.4997 51.31 51 
2000 2.84 0.94 | 3.78 0.9994 48.91 48 
3000 4.22 1.41 5.63 1.4991 51.76 5l 
4000 5.6 1.87 | 7.47 1.9988 44.525 44 
5000 6.96 2.33 | 9.29 2.4985 48.835 48 
10000 14.3 4.8 19.1 4.9976 51.64 51 
50000 69.3 23.4 | 92.7 24.988 48.45 48 
100000 142 48 190 49.976 50.83 50 
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Fig. 1. Performance on Arithmetic Mean 


sizes range from 1000 to 100000. An elliptic curve with 256-bit group elements 
constitute the group G. The hash function SHA-256 is utilized for h. The Paillier 
encryption involves the product of two 768 bit primes which yields a plaintext 
space of 1536 bits and ciphertext of length 3072 bits. 

In addition, we have segmented the total time into disjoint durations of the 
offline and online phases. The offline phase refers to the pre-computation process 
where the parties can perform offline computations of their respective individual 
dataset even before the initial round of communication commences. The online 
phase begins from the initial round of communication to the end of the protocol. 
In practice, the online phase duration generally provides a more relevant indi- 
cator of practical performance as opposed to the total time taken. The results 
are presented in Tables 2 and 3, and illustrated in Figs. 1 and 2, which highlight 
that both the running time and communication cost in our protocols are linear 
with respect to the input size. 
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Table 3. Performance of Private Intersection-geometric mean protocol. 


Time(s) Comm.[MB] | Actual Value | Output (rounded down) 
Input Size | Offline | Online | Total 
1000 1.53 0.52 | 2.05 0.4997 37.84 37 
2000 2.93 0.99 | 3.92 0.9994 38.07 38 
3000 4.31 1.48 | 5.79 1.4991 36.71 36 
4000 5.88 1.91 7.79 1.9988 36.58 36 
5000 7.01 2.38 | 9.39 2.4985 42.288 42 
10000 15.2 5.2 20.4 4.9976 44.28 44 
50000 70 24.2 | 94.2 24.988 43.04 43 
100000 151 52 203 49.976 37.13 37 
22 5.5 
mine 
18 L Offline 45 
16 4 
14 35 
Biz = 3 
€ 
E10 £2.5 
8 aaa: 
6 1.5 
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Fig. 2. Performance on Geometric Mean 
Table 4. Performance of PSI-Stats on a UCI repository dataset. 
Time(s) Comm.[MB] | Actual Mean | Output (rounded down) 
Input Size Offline | Online | Total 
45211 63.17 | 20.78 | 83.95 22.591 1422.65 1422 


Table 5. Performance of PSI-Stats on a Kaggle dataset. 


Output (rounded down) 


1422.65 1422 


Comm.[MB] 


Input Size Offline | Online | Total 
30000 45.6 15.6 
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We also select a couple of actual datasets to conduct our experiments. The 
dataset [27] taken from the UCI ML repository [12] relates to marketing cam- 
paigns of a Portuguese banking institution. This dataset consists of a pair of sets: 
one of which is a proper subset of the other. We extract the attribute of interest 
corresponding to the yearly bank balance of bank’s clients. A holds the smaller 
set of size 4521 while B holds the larger set of size 45211. To simulate a practical 
setting, each input (representing a client) is assigned a kojin bango which is a 
unique 12-digit ID number issued to residents in Japan for taxation purpose. 
Other identifiers such as the Social Security Number issued in the United States 
can also be similarly assigned. The set size of A is then increased to 45211 to 
match that of B by generating a distinct kojin bang for each new client. Conse- 
quently, the set size is 45211 for both parties and the intersection corresponds to 
the original smaller set of size 4521. Related figures in the computation of mean 
yearly balance of common clients between these two parties via PSI-Stats are 
presented in Table 7. The second dataset involves spending from a mall taken 
from Kaggle [2] which has an input size of 30000. We use the column labelled 
“payment 2” as the set of corresponding associated values. The performance of 
PSI-Stats of selected functionalities on the Kaggle dataset is recorded in Table 5. 


5.1 Comparisons with Circuit-Based PSI Protocols 


The main direct competitor to PSJ-Stats is a general-purpose circuit-based PSI. 
One advantage of PSJ-Stats compared to existing circuit based approaches is that 
it incurs the lowest communication overhead. This is particular crucial in low 
bandwidth settings or where communication cost is at a premium. In this regard, 
we demonstrate the comparisons in two aspects. The first evaluation is based 
on the monetary cost to run the protocols on an external cloud server which is 
dependent on the computation and the communication cost. This provides a fair 
universal comparison of protocols with varying computation and communication 
cost. Such a mode of comparison was first introduced in [32] and also applied 
in [8]. For this purpose, our reference cloud server is the Amazon Web Service 
(AWS) with the reference price model? of (0.005 USD/hr, 0.08 USD/GB). The 
second evaluation is based on the run times of protocols when conducted at a 
bandwidth setting of 1 Mbps with a round-trip time of 0.02 ms. 

The current most efficient state-of-the-art circuit-based PSI protocol is the 
recent work of Pinkas et al. [34]. While there are two main approaches for the 
generic secure two-party computation of Boolean circuits, the GMW approach 
is the better performing over Yaos garbled circuit on the balance of both com- 
munication and computational cost. Since one of our evaluations is based upon 
monetary cost, we shall use the GMW approach of [34] to serve as a benchmark 
in the comparisons of PSI-Stats with circuit-based PSI approaches. We run the 
“no stash” protocol of [34] along with the arithmetic mean protocol of PSI- 
Stats. The results are reflected in Tables6 and 7. It should be emphasized that 


? https: //aws.amazon.com/ec2/spot/pricing.  https://aws.amazon.com/cloudfront / 
pricing/. 
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Table 6. Comparisons of monetary cost (in cents) 


Pinkas et al. [34] PSI-Stats 
Input Size | Time(s) | Comm.[MB] | Cost(cents) | Time(s) | Comm.[{MB] | Cost(cents) 
5000 0.649 13.6 0.1063 9.29 2.5 0.0208 
10000 1.042 26.3 0.2056 19.1 5.0 0.0417 
20000 2.077 52.9 0.4136 37.7 10.0 0.0833 
30000 3.105 79.2 0.6192 56.3 15.0 0.1249 
220 107.96 2702 21.12 1902 524 4.36 


Table 7. Comparisons of run time at network bandwidth setting of 1 Mbps. 


Pinkas et al. [34] PSI-Stats 
Input Size | Time(s) | Comm. [MB] | Input Size | Time(s) | Comm. [MB] 
5000 113.48 13.6 5000 29.1 2.5 
10000 222.60 26.3 10000 58.3 5.0 
20000 444.01 52.9 20000 119.1 10.0 
30000 666.23 79.2 30000 174.8 15.0 
300,000 6529 776 %20 6113 524 


the “no stash” protocol outputs the set intersection (without payload) as com- 
pared to the arithmetic mean of the set intersection given in the running times of 
PSI-Stats. Substantial modifications have to be incorporated to the “no stash” 
protocol to support secure post-processing of the output of the set intersection 
(e.g. statistical functions) which incur additional communication and computa- 
tional overheads. Nevertheless, the results of [34] in Tables 6 and 7 serve well as 
a lower bound reference for the output functionality of arithmetic mean. More- 
over, to optimize the efficiency when running the protocol of [34], we compute in 
Matlab the minimal number of mega-bins B required such that each mega-bin 
contains at most maz, < 1024 elements with probability under 2740 for various 
set sizes n. The probability that there exists a bin with at least maz, elements 
given in [34] is bounded above by 


Eee” 


i=Maxp 


Our computed minimum values of B are 19, 38, 75, 113 for n = 5000, 10000, 
20000, 30000 respectively. 

From the experimental results, PSI-Stats has a lower communication over- 
head by an average factor of 5.2x and incurs 5x less monetary cost compared 
to [34] as evidenced by Table 6. The results of Table 7 also demonstrates a much 
lower run time in a network bandwidth setting of 1 Mbps. Moreover, when the 
round-trip time is increased from 0.02 ms to 100 ms, the run time of [34] increases 
by 3.2s and 3.84s for set sizes of 21? and 216 respectively. In contrast, the run 
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time increase for PSI-Stats is merely 0.38. Since the complexities of PSJ-Stats 
scale linearly with respect to computation and communication, the comparison 
of monetary cost ratio is expected to be maintained at 5x for larger datasets. 


6 Related Work 


6.1 Existing PSI Protocols 


An early PSI protocol is based upon the Diffie Hellman paradigm [26] which is 
also applicable in elliptic curve cryptography. A similar idea can be traced back 
to [40]. A method based on oblivious polynomial evaluation was introduced in 
[16,17]. An approach via blind RSA was presented in [11]. All the above methods 
are based on public-key cryptography. 

Oblivious transfer (OT) extension was first introduced in [23]. The main 
objective of an OT extension is to enable the computation a large number of 
OT based off a smaller number along with symmetric cryptographic operations 
to achieve better running times. This technique engendered numerous OT-based 
PSI protocols. The notion of garbled bloom filter based on OT extension was 
coined in [13] and utilized to perform PSI. The main idea is to allow one of the 
parties to learn the bit-wise AND of two Bloom filters via OT. This outcome 
results in a valid Bloom filter for the set intersection. That was subsequently 
optimized to some extent in [36]. There are a number of other notable OT-based 
schemes presented in [18,25,30,33,36,37]. The particular work of [25] is based 
on an OT extension protocol found in [24]. 

Recently, threshold PSI protocols have been proposed [42,43]. The earlier 
work [42] leaks the intersection size, while the subsequent work [43] has no 
such leakage. Instead of revealing computation results of associated values, they 
suppress the output if the intersection set does not satisfy the agreed policy. 

Generic multi-party protocols such as garbled circuits can also be used to 
compute PSI. The first such protocol involving garbled circuit appeared in [21] 
which was later improved in [36]. Other notable circuit-based PSI protocols are 
presented in [15,33-35]. The protocol of [10] can incorporate several approaches 
of 2PC beyond garbled circuits. Circuit-based approaches can typically serve for 
generic computation purposes. On the other hand, they result in larger commu- 
nication overheads as compared to other custom-based PSI protocols. 

The most related existing work in relation to this paper is that of Ion et 
al. [22] which considers the single special case where F is the sum. It should be 
noted that any natural attempts to convert the computation of sum to arithmetic 
mean by sending the set intersection size to the receiving party B violates the 
privacy requirements of the problem statement since the additional knowledge 
of cardinality can induce an inference attack. In contrast, our work here provides 
solutions to a large class of statistical functions F without this drawback. By 
doing so our protocols provide more flexible and comprehensive utilities. 
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7 Conclusion 


We present PSI-Stats which supports the secure computations of various statis- 
tical functions in a privacy-preserving manner. The benefit of PSI-Stats having 
a substantially lower monetary cost and communication cost compared to cir- 
cuit based PSI approaches is desirable in many business applications. This is 
also relevant in environments of low network bandwidths. We have noted from 
our experiments that the run time of all of our protocols is dominated by the 
time taken to perform encryption of each associated value. As such, PSIJ-Stats is 
highly parallelizable and the performance can be further enhanced when multi- 
threading is enabled. In addition, PSI-Stats can easily be extended to enable 
statistical outputs only if the size of the intersection set exceeds a pre-defined 
threshold value. In instances where the intersection set of the identifiers between 
the two parties is null or under a specified threshold size, party A can call for an 
early abort of the protocol. This feature has also appeared in [42] where a secret 
key cannot be recovered if the size is not reached, thereby prompting an abort. 
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Abstract. In oblivious finite automata evaluation, one party holds a pri- 
vate automaton, and the other party holds a private string of characters. 
The objective is to let the parties know whether the string is accepted by 
the automaton or not, while keeping their inputs secret. The applications 
include DNA searching, pattern matching, and more. Most of the previous 
works are based on asymmetric cryptographic primitives, such as homo- 
morphic encryption and oblivious transfer. These primitives are signifi- 
cantly slower than symmetric ones. Moreover, some protocols also require 
several rounds of interaction. As our main contribution, we propose an 
oblivious finite automata evaluation protocol via conditional disclosure of 
secrets (CDS), using one (potentially malicious) outsourcing server. This 
results in a constant-round protocol, and no heavy asymmetric-key prim- 
itives are needed. Our protocol is based on a building block called “an 
oblivious CDS scheme for deterministic finite automata” which we also 
propose in this paper. In addition, we propose a standard CDS scheme for 
deterministic finite automata as an independent interest. 


Keywords: Finite automata - Conditional disclosure of secrets - 
Multi-client verifiable computation - Secure multi-party computation 


1 Introduction 


In a problem of oblivious finite automata evaluation, one party holds a private 
automaton, and the other party holds a private string of characters. The objec- 
tive is to let the parties know whether the string is accepted by the automaton 
or not, while keeping their inputs secret. 

The applications include DNA matching, string searching, password format 
validation, spam email detection, log files audition, and more. As stated in [39], 
DNA technology can help us predict a probability that a patient will develop a 
specific disease, and predict the result of the therapy. However, revealing personal 
DNA sequence to public can be harmful. An undesired parental relationship can 
be discovered, or an employee may be rejected to work with a company due to a 
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probability to develop some diseases. Thus, DNA matching should be performed 
in an oblivious way. This is also applied to other sensitive information such as 
passwords, email contents, and log files. 

As an example, a patient may want to know if there is any anomaly in his 
or her DNA sequence. Since the DNA sequence can be considered as private 
information, the patient should not reveal his or her DNA in clear. On the other 
hand, a doctor has the anomaly pattern modeled with regular language. The 
pattern can be considered as a valuable research insight, and should also be kept 
secret. To let one or both of the parties know whether the DNA sequence matches 
the pattern, oblivious finite automata evaluation is perfectly suitable here. As 
another example, an email message and a malware pattern can be considered as 
sensitive information. To obliviously check whether the email contains a malware 
or not, the oblivious finite automata evaluation can also be used in the same way. 

However, almost all of the previous works are constructed based on public 
key cryptographic primitives, such as homomorphic encryption and oblivious 
transfer (OT). These asymmetric-key operations (e.g. exponentiation) are some 
orders of magnitude slower than the symmetric-key operations. In addition, some 
of the previous works require several rounds of interaction. 

On the other hand, generic secure multi-party computation protocol can also 
be used, but their performance can be worse compared to specifically designed 
methods. Executing string matching algorithm with Yao’s garbled circuit proto- 
col [29,41] can be inefficient due to the number of comparisons involved in the 
dynamic programming technique. Using information-theoretic protocol is also 
possible [11,21], but the process will be interactive, and the round complexity 
will depend on the size of the circuit, which can be large in this case. 

A verifiable oblivious finite automata evaluation protocol in an outsourced 
setting is also an open problem stated in [42]. 


1.1 Our Contributions 


In this paper, we propose an oblivious finite automata evaluation protocol via 
conditional disclosure of secrets (CDS). This results in a constant-round protocol, 
and no heavy asymmetric-key primitives are needed. We claim three contribu- 
tions of our work as follows. 


Oblivious CDS for DFA. We present the first CDS scheme for the class of 
deterministic finite automata (DFA). DFA allows to compute satisfiability for 
regular languages, and therefore is suitable for the aforementioned applications 
(e.g., DNA matching). Previous work on CDS were proposed for some other 
classes; for example, equality, inner product predicate [15], and set intersection 
[7]. To the best of our knowledge, we are the first to consider CDS for DFA. 
As a short introduction to (standard) CDS scheme for DFA, the scheme 
involves two senders and a receiver. One sender has a DFA, the other sender 
has an input string, and the receiver knows both the DFA and the input string. 
The two senders also have a common secret and a common randomness which 
are not known to the receiver. Each sender can send only one message to the 
receiver without any communication with the other sender. The goal of the CDS 
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scheme for DFA is to let the receiver know the secret if and only if the automaton 
accepts the input string. 

In this paper, we propose an oblivious CDS scheme for DFA. The main 
difference between the oblivious CDS and the standard CDS is the information 
leaked to the receiver. In the standard CDS, the receiver knows the automaton 
and the string, and knows whether the automaton accepts the string or not, 
while the receiver in the oblivious CDS will not know. Thus, oblivious CDS is 
more suitable for privacy-preserving applications. 


Oblivious DFA Evaluation Protocol via CDS. We propose an oblivious 
DFA evaluation protocol using the oblivious CDS scheme for DFA as a build- 
ing block. The advantage of our protocol is that it is constant-round and non- 
interactive. Using CDS as the underlying scheme can be seen as a trade-off 
between adding one (potentially malicious) outsourcing server and using asym- 
metric cryptographic primitives. 


Standard CDS for DFA. As an independent interest, we also propose a stan- 
dard CDS scheme for DFA. To the best of our knowledge, converting CDS for 
other computation classes to CDS for DFA is not straightforward; ours is the 
first explicit construction for such CDS for DFA. 


1.2 Our Approaches 


One of our goals is to achieve a constant-round protocol for oblivious finite 
automata evaluation. Previous protocols [14,35,42] that perform in constant 
rounds for a similar task all use the idea of garbled circuits and require 
asymmetric-key primitives such as homomorphic encryption or oblivious trans- 
fer. Intuitively, these asymmetric-key primitives play an essential role in hiding 
private inputs from one party to the other party in the two-party settings. 

Our approach to mitigate the need for asymmetric-key primitives is to uti- 
lize an additional (potentially malicious) outsourcing server. We observe that 
oblivious CDS [7] fits wells in this context as it allows an outsourcing server 
to compute a function obliviously without knowing the inputs or the result. 
Moreover, known oblivious CDS schemes do not require costly asymmetric-key 
operations. However, all the previous CDS constructions do not support the 
class of finite automata (even for standard CDS schemes). To this end, we hence 
propose the first oblivious CDS for DFA. 

We adapt the garbled circuit techniques from Frikken [14] to construct our 
oblivious CDS for DFA. The construction includes a pseudorandom function 
(PRF). We then use our oblivious CDS as a building block for our oblivious DFA 
evaluation protocol, based on the multi-client verifiable computation framework 
of Bhadauria and Hazay [7]. Note that the CDS schemes in [7] consider predicates 
of equality and set intersection, which are different from DFA. 

For the standard CDS, we extend the techniques in the ABE context from 
[2] that convert DFA into span programs, and adapt to the CDS context. Our 
construction of standard CDS is information theoretic. (Our oblivious CDS may 
imply a standard CDS, but that construction will require a PRF.) 
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1.3 Related Works 


While previous studies on oblivious evaluation for DFA were somewhat peaking 
about 10 years ago or more [8,14,35] (with recent improvements such as [42] 
being somewhat less major), there are some renewed interests very recently (in 
2019-2021) in secure computation regarding DFA (and a related class, namely, 
NC1), in the context of ABE in top conference papers such as [1,2,22, 23,28]. 
These reflect theoretical and practical interests towards secure DFA computa- 
tions. We hope that our work offers practical improvements for oblivious DFA 
evaluation as our protocol is the first explicit constant-round protocol that does 
not require expensive public-key operations. 
In this subsection, we briefly describe related works as follows. 


Oblivious Finite Automata Evaluation. The problem of oblivious DFA eval- 
uation was first studied by Troncoso-Pastoriza et al. [39]. Their protocol is based 
on additive secret sharing, homomorphic encryption, and oblivious transfer. At 
the start of each round (corresponding to each character in the input string), 
both parties hold shares of the current state of the automaton. Homomorphic 
encryption and oblivious transfer are then applied in order to compute the shares 
of the next state. It is obvious that the number of communication rounds is linear 
in the length of the input string. The protocol also requires O(|a||Q|) modular 
exponentiations (where || is the length of the input string and |Q] is the total 
number of states of the DFA), which can be a performance drawback. 

The second work proposed by Frikken [14] tried to reduce the number of 
rounds by using the idea of Yao’s garbled circuit [41]. They also reduce the 
number of modular exponentiations to O(|z|). It is shown in [14] that their 
protocol is 2 to 3 orders of magnitude faster than [39]. However, the protocol is 
still based on oblivious transfer. 

The first protocol that is secure against malicious adversaries is proposed by 
Gennaro et al. [18]. It is based on public-key encryption and zero knowledge proof 
of knowledge. The protocol requires several rounds of interactions. Another work 
that discussed the security in malicious setting is the work of Mohassel et al. [35]. 
Using similar idea from [14], they proposed an oblivious evaluation protocol for 
DFA with alphabets {0,1}. The protocol is based on OT extension [26] against 
malicious adversaries. 

Laud and Willemson [27] modeled the transition function as a polynomial, 
and then evaluate it privately using arithmetic black box (ABB) model. This 
ABB model can be realized by either secret sharing, homomorphic encryption, 
or other primitives. If information theoretic primitive such as secret sharing 
is used, the protocol is also information theoretic. However, performing secure 
multiplication on secret shares requires several rounds of interactions. 

More previous works include the work of Di Crescenzo et al. [13] which is 
based on conditional transfer protocol, and the work of Zhao et al. [42] which 
considers a setting with additively shared input string. 


Oblivious Finite Automata Evaluation with Outsourcing Servers. The 
protocol proposed by Blanton and Aliasgari [8] generalizes the work of [39] to the 
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Table 1. Comparison between oblivious DFA evaluation protocols 
Protocol Primitives Use Asym. Parties Round Comm. Cost Security 
Troncoso et al. [39] SS, HE, OT Y 2 O(|x|) O(jz|((Q| +121) SH 
Frikken [14] OT, PRF Y 2 o(1 O(|x||Q||2) SH 
Gennaro et al. [18] HE, ZK Y 2 O(|]) O(|x||Q|| £) M 
Mohassel et al. [35] OT, PRG Y 2 o(1 O(|x||Q||2]) M 
Laud& Willemson [27] ABB Y/N 2+ O(|Q|| =) O(|x||Q||2|) SH/M 
Crescenzo et al. [13] PRF Y 2 oO(jz|QIIED)  O(lzllQII2I SH 
Zhao et al. [42] HE, PRG Y 2 o(1 O(|x||Q|| £) SH 
Blanton&Aliasgari [8] SS, OT Y 4+ O(|]) O(|x||Q||2|) SH 
Wei&Reiter [40] HE Y 3 O(\a|) O(|x||Q||2]) SH, M* 
Ours (Section 4) PRF N 3 oa O(|x||Q|\|Z|) SH, Mt 
Asym: Asymmetric-key Primitives SS: Secret Sharing HE: Homomorphic Encryption 
OT: Oblivious Transfer PRF: Pseudorandom Function PRG: Pseudorandom Generator 
ZK: Zero-Knowledge Proof ABB: Arithmetic Black Box (can be implemented from SS or HE) 
SH: Semi-honest M: Malicious Y/N and SH/M: Depend on the building block 


*Client can be semi-honest, server can be malicious 
String and automaton holders can be semi-honest, outsourcing server can be malicious 


outsourced setting. To keep all the inputs private, their work uses a secret sharing 
technique to outsource the automaton and the input string to two computing 
servers. These servers are assumed to be semi-honest. Oblivious transfer is used 
as a building block. In the case that we want to outsource the inputs to more 
than two servers, threshold homomorphic encryption must be applied. 

Another work in outsourced setting is proposed by Wei and Reiter [40]. In 
their protocol, a client with a DFA wants to execute it on encrypted string stored 
on a cloud server. They model the transition function as a polynomial, and then 
evaluate it privately using homomorphic encryption. The decryption key from 
the string owner is shared between the client and the cloud server. 

We note that almost all of the previous works (including both with and 
without outsourcing servers) are based on asymmetric cryptographic primitives. 
Some also require several round of communication and interaction. Comparison 
between the oblivious evaluation protocols is presented in Table 1. 


CDS. Conditional disclosure of secrets (CDS) was firstly proposed in [19] as a 
building block for symmetrically private information retrieval system (SPIR). 
Their CDS supports the condition equivalent to monotone access structure of a 
secret sharing scheme. CDS is also used to construct priced oblivious transfer 
(i.e., SPIR with cost for each item) in [3]. In addition, CDS is used to reduce 
share size of secret sharing schemes [4—6, 30,32]. Some works tried to relate CDS 
to attribute-based encryption (ABE) [15]. Recently, the work of [7] proposed new 
variants of CDS, including private CDS and oblivious CDS. The CDS schemes 
of [7] are for equality and set intersection classes. The main application of these 
variants is a multi-client verifiable computation protocol. We list some CDS 
schemes in the literature in Table 2 (note that this list is not exhaustive). To the 
best of our knowledge, there is no known CDS for DFA until our work. 
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Table 2. Comparison between CDS schemes 


CDS Scheme Type Functionalities Security 

Gertner et al. [19] Standard Monotone access structure Info. theoretic 
Gay et al. [15] Standard Equality, Inner product, Index predicate, Prefix, Disjointness Info. theoretic 
Liu et al. [31] Standard Index predicate Info. theoretic 
Bhadauria&Hazay [7] Oblivious Equality Info. theoretic 
Bhadauria&Hazay [7] Private Equality, Inequality, Set intersection cardinality Computational 
Ours (Section 3) Oblivious DFA Computational 
Ours (Section 5) Standard DFA Info. theoretic 


Multi-client Verifiable Computation. Gennaro et al. [17] was the first to pro- 
pose the definition of verifiable computation protocol. Their construction, based on 
Yao’s garbled circuit, is only for two parties, a client and an outsourcing server. The 
definition was then generalized to multi-client setting by Choi et al. in [9]. Using 
non-interactive key exchange (NIKE) protocol, their protocol is secure against 
malicious server, and semi-honest clients. Gordon et al. [24] later strengthened the 
security guarantee to malicious clients setting, using homomorphic encryption and 
attribute-based encryption as building blocks. It can be seen that the existing ver- 
ifiable computation protocols at that time are quite complex. Recently, Bhadauria 
and Hazay [7] proposed two-client verifiable computation protocol based on vari- 
ous types of CDS. Some of the advantages provided by CDS are simplicity of the 
verification, and no need for asymmetric-key primitives. 


1.4 Organization 


After reviewing preliminaries in Sect. 2, we propose an oblivious CDS scheme for 
DFA in Sect.3. An oblivious DFA evaluation protocol via CDS is presented in 
Sect. 4. As an independent interest, we propose astandard CDS scheme for DFA in 
Sect. 5. Finally, Sect. 6 concludes the paper. Proofs are provided in the full version. 


2 Preliminaries 


In this section, we review related background knowledge, including finite 
automata, conditional disclosure of secrets, and multi-clients verifiable computa- 
tion. Definitions of PRF, coin-tossing protocol, and monotone span program are 
standard, and are provided in the full version. Matrices are denoted with bold capi- 
tals. We denote {1,...,n} and {a,a+1,...,b} with [n] and [a, b], respectively. The 
symbol ~ denotes standard indistinguishability between two distributions, which 
can be information theoretic or computational depending on the case. 


2.1 Finite Automata 


In this paper, we consider deterministic finite automata (DFA), which is a special 
case of nondeterministic finite automata (NFA). 
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A deterministic finite automaton is defined by a 5-tuple M = (Q, X, A, qo, F) 
where Q is a finite set of states, X is a finite set of all possible alphabets, 
A:Qx X — Q is a transition function which outputs the next state from the 
current state and the given alphabet, qo € Q is an initial state, and F C Q is 
a set of accepting states. (In case of a nondeterministic finite automaton, the 
transition function is generalized to A: Q x X — 2°.) In this work, we assume 
that states are numbered from 1 to |Q| where go = 1, and X can be numbered 
from 0 to |X| — 1. In Sect.5, we also assume that there is only one accepting 
state, F = {|Q|}. Any DFA can be transformed to satisfy these conditions.! 

A string of alphabets x = x92, ---@%p,_1 € X” is accepted by the automaton 
M if there is a sequence of states qoqi ++: qn such that qi = A(q@—1, zi—1) for all 
i € [n], and qn E€ F. We say M(x) = 1 if M accepts x, and M(x) = 0 otherwise. 


2.2 Conditional Disclosure of Secrets 


In our paper, we only focus on 3-party CDS, where Alice and Bob are senders, 
and Claire is a receiver. Alice has an input a from the domain A, a secret s 
from {0,1}", and a randomness r from the domain R. Bob has an input b from 
the domain B, the same secret s, and the same randomness r. For the standard 
CDS, Claire only knows the inputs a and b. Everyone agrees on a function 
f: Ax B — {0,1}. Each of Alice and Bob can send only one message to 
Claire without any communication to each other. The goal of the scheme is to 
let Claire learn the secret s if f(a,b) = 1, and let Claire learn nothing otherwise. 
The definition of CDS is as follows. 


Definition 1 (CDS). Let f : Ax B — {0,1} be a condition, s € {0,1}" be a 
secret, andr E€ R be a randomness chosen randomly with uniform distribution. 
Let Enca and Encg be PPT encoding algorithms, and Dec be a deterministic 
decoding algorithm. The correctness and secrecy properties must hold as follows. 


Correctness: For all inputs (a,b) € A x B where f(a,b) = 1, 
Pr|Dec(a, b, Enca (a, s, r), Encg (b, s,r)) 4 s] < negl(x). 


Secrecy: There exists a polynomial time algorithm Sim such that for every input 
(a,b) € Ax B where f(a,b) = 0 and a secret s € {0,1}", the following distribu- 
tions are indistinguishable. 


{Sim(a, b) ac A,bEB ~ {Enca (a, s, r), Encg(b, s, rT) }acA,beB- 


One useful variant of CDS called as oblivious CDS is proposed in [7]. In 
this setting, Claire does not know a and b. Informally explained, Claire learns a 
value at the end of the scheme, but Claire will not know whether the condition 
is satisfied, or whether the decoded value is equal to the secret. The definition of 
the oblivious CDS from [7] is as follows. For the definition of secrecy of oblivious 
CDS, we use indistinguishability based definition, which is equivalent to the 
real-ideal definition in [7]. 


1 This can be done by adding one special character marking the end of the string. 
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Input a © Enc, (a,s,r) 
Secret s Alice SF Claire knows Claire knows 
Randomness r í ® a and b f(a, b) 

Standard CDS v v 
a D 


Secret s E i Oblivious CDS x x 
Randomness r Bob ncg(b,s,r) 


Fig. 1. Standard CDS and oblivious CDS 


Definition 2 (Oblivious CDS). Let f: Ax B — {0,1} be a condition, 
s € {0,1}* be a secret, andr € R be a randomness chosen randomly with 
uniform distribution. Let Enca and Encg be PPT encoding algorithms, and Dec 
be a deterministic decoding algorithm. The properties must hold as follows. 


Correctness: For all inputs (a,b) € A x B where f(a,b) = 1, 
Pr[Dec(Enca(a, s, r), Encg(b, s,r)) 4 s| < negl(x). 


Indistinguishability: There exists a polynomial time algorithm Sim such that for 
every input (a,b) € Ax B and a secret s € {0,1}", the following distributions 
are indistinguishable. 


{Sim(1!2!, ae Dec(Enca(a, 8,7), Encg(b, s, r))) }ac A,be B 
~ {Enca (a, s, r), Encg(b, s, r) }ae aves: 


The diagram of the standard and oblivious CDS is shown in Fig. 1. 


2.3 Multi-client Verifiable Computation 


Similar to [7,9], we consider a multi-client verifiable computation (MVC) setting 
where a set of clients outsources the computation to an untrusted computing 
server. We focus on a non-interactive setting where clients do not interact with 
each other after the setup phase. In our work, we consider a setting with semi- 
honest clients and a malicious server (assume no collusion). The clients should 
follow the protocol perfectly, while the outsourcing server may try to change the 
computation result. The definition of MVC from [7,9] is as follows. 


Definition 3 (MVC). Consider a setting where each client has an input aj, 
and the goal is to compute f(a1,...,Q@m). The MVC protocol consists of four 
algorithms. 


- ô — Setup: Generate a common random string 6 for all clients. 

- (Qj, Ti) — Input(a;, 6,14) : For each client, using a;, 6, and security parameter 
14 as inputs, this algorithm outputs an encoded input &; and the decoding 
secret Ti kept private by the client. 

- (Bı, -.., Bm) — Compute(f,ai,...,@m): Using the function description and 
the encoded inputs, the computing server executes this algorithm to generate 
encoded outputs bi. 
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PrivClient), ,(1*) : PrivServer% (1^) : 
(ay, ...,0%,), (Q1,-..,Qm) — Ao (1^) (ay,...,0%,), (Q1,..-,Q@m) — Ao(1?) 
where a? = a} 6 — Setup 
and f(a{,...,a%,) = f(at,...,an) (aj, Tj) — Input(a}, 6, 1) for all j € [m] 
6 — Setup y — Ai(Qi,..-,@m) 


(a3, Tj) — Input(a?, 8, 1*) for all j € [m] return 7’ 
(Ai, TES , Bm) ba Compute( f, Q1,--- Om) 

y’ — A (bi, Ti) 

return 7’ 


Fig. 2. Privacy for multi-client verifiable computation 


- yU {1} — Verify(6;,7;) : For each client, using Bi and T; as inputs, this 
algorithm generates an output y (which supposes to be f(ai,...,am)), or 
outputs a symbol L in case that the server attempted to cheat. 


We are interested in the protocol that is sound and private. 


Soundness: For all inputs (a1,...,Qm) and a malicious server A, let 6 — Setup, 
(Gj, Ti) — Input(a;,6,1*), (G1,..., Bm) — A(f,1,...,@m), and y U {1} — 
Verify( 6i, Ti) for all i € [m]. It must hold that 


Priy 4 f(ai,..-,Q@m)] < negl(A). 


Privacy Against the Clients: We consider a setting with adversarial i-th client. 
From the security game in Fig. 2, the MVC is private against the client if 


|Pr[PrivClient® ,(1*) =1]-— Pr[PrivClient ;(1*) = 1]| < negl()). 


Privacy Against the Server: We consider a setting with adversarial server. From 
the security game in Fig. 2, the MVC is private against the server if 


|Pr[PrivServer,(1*) = 1] — Pr[PrivServer),(1*) = 1]| < negl(,). 


3  Oblivious CDS for DFA 


In this section, we propose an oblivious CDS scheme for DFA, which is used as 
a building block to construct an oblivious DFA evaluation protocol in the next 
section. 

In the setting of an oblivious CDS scheme for DFA, Alice has a private DFA 
M, Bob has a private input string x, and both have a common secret s and a 
common randomness r. On the other hand, Claire does not have any inputs. The 
condition is defined as f(M,x) = 1 if M accepts x, and f(M,x) = 0 otherwise. 
At the end of the scheme, Claire should learn the secret if and only if M accepts 
x. Our construction is based on the idea of garbled transition matrix from [14]. 
The proposed scheme is shown in Fig. 3. 
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1. 


Oblivious CDS scheme for DFA 


Input: 


— Alice has a DFA M = (Q, X, A, qo, F), a secret s, and a randomness r. 
— Bob has an input string £ = rov1--- 2-1, a secret s, and a randomness r. 
— Claire has no input. 


Algorithm: 


For each t € [0,n — 1], Alice randomly generates a state permutation function 
Tı : Q — Q and a character permutation function ¢; : X — X from the 
randomness r. 

For each t € [0,n—1], Alice extracts a garbled state key k;'**"(q) for each q € Q, 
and a garbled character key Kens (eo) for each ø € X from the randomness r. 
For each t € [0,n — 2], q € Q, and o € X, Alice randomly generates a garbled 
transition matrix element 


state char state 
Geme(a).or() = A (kirla) Kipila) ® (ett mp1 (alao) llm (Ala 2). 


For each q € Q, and o € X, Alice generates a garbled transition matrix 
element of the last step. If A(q, 0o) € F, then 


st 


ate char 
Gn—1 8-1 (@)bn—1(0) = A (Kn ip la) Kn dda —1(o)) ® (wl|mw + s) 


where w and m are generated from the randomness r. If A(q,o) ¢ F, then 


state char 
In—1mn-1(4).bn—1(2) = H (kacian Bn =1,4n—1(0)) ® (ell) 


where (c,d) is a random point not on the line P(x) = my + s. This point 
(c,d) is not known to Bob. 

Alice sends the garbled transition matrix {9:,7,(q),¢:(0)}t,q,0 and the garbled 
state key of the first state ko" (4,) together with mo(qo) to Claire. 

For each t € [0,n — 1], Bob generates the character permutation function ¢¢ : 
X — X, and extracts the garbled character key kee a ,) from the randomness 
r in the same way as Alice. 

Bob randomly chooses a point (z, mz +s) where z # w is totally random, and 
m is generated from the randomness r. Note that Alice does not know this 
point, and this point is different from Alice’s point. 

Bob sends (¢:(x«), Kae) for each t € [0,n — 1], and (z,mz + s) to Claire. 
For each t € [0,n — 1], Claire, with current permuted state 7:(q), permuted 
character ¢:(x+), garbled state key k'**,), and garbled character key kg'a (,.,), 
computes the next permuted state and state key 


state state char 
(KERT mepi (Alaz) [Tit (Alq, 2))) = Jimeta), beltet) BH (Ke mea) Ktore) ) 


and in the last round discovers 


state 


aiia char 
(illi) = In-1,nn-1l0)bn-1len-1) D H (kaci nnil) Enel, on-ilen1)) 


Claire interpolates the points (i, j) from the garbled matrix and (z, mz + s) 
from Bob to find the y-intercept. Claire outputs this value as s’. 


Fig. 3. Oblivious CDS scheme for DFA 
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As an overview, the scheme can be divided into two parts. In the first part, 
Alice generates a garbled version of the DFA M, and Bob generates keys cor- 
responding to the input string x. Claire, who receives the garbled transition 
matrix and keys, can recover an intermediate result embedded inside the gar- 
bled matrix. This recovered intermediate result depends on the condition of M 
and x. We use state permutation functions to hide the automaton structure, and 
use character permutation functions to hide the input string. The suitable state 
and character permutations can be as simple as 7;(7) = ((i + u) mod |Q|) +1 
and (i) = (i + vz) mod |X| where uz and v are random shift values. Since 
Claire can only decode the value corresponding to the state at each step, and 
cannot know whether the final state is an accepting state or not, Claire does not 
know anything from the intermediate result of the garbled matrix. 

In the second part, Claire decodes the secret from the intermediate result. The 
intermediate result that Claire can recover will be a point on a 2D plane. Before 
sending messages to Claire, Alice and Bob agree on the same linear equation 
P(x) = mx + s where m is generated from the common randomness. If the 
automaton accepts the input string, Claire will recover a random point on this 
line. If not, Claire will recover a random point not on this line. Bob also sends 
the other random point on this line to Claire. Finally, Claire uses two points 
from the garbled matrix and from Bob to decode the secret via interpolation. 

The scheme is secure in the sense that Claire cannot learn anything about 
the inputs and the result. See the following theorem. 


Theorem 1. Assume that H is a secure PRF. The oblivious CDS for DFA in 
Fig. 3 satisfies correctness and indistinguishability as per Definition 2. 


4 Oblivious DFA Evaluation via CDS 


In this section, we present an oblivious DFA evaluation protocol via CDS. The 
protocol shown in Fig.4 is based on the multi-client verifiable computation 
framework from [7]. In short, we execute two oblivious CDS schemes for DFA 
M and M, where M is the complement DFA of M. Since exactly one condition 
must be satisfied, Claire can recover the secret for that condition (but Claire 
will not know which one). Security then follows from the underlying oblivious 
CDS. The protocol is secure against semi-honest Alice and Bob, and malicious 
Claire. Here, Claire can be considered as an untrusted outsourcing server. 

There is a reason why we have to execute two oblivious CDS schemes for 
DFA M and M. If Alice and Bob execute only one oblivious CDS scheme for 
the DFA M, Claire may output a random value, and then Alice and Bob may 
conclude that the DFA does not accept the input string. When the oblivious 
CDS schemes for both M and M are executed, it is difficult for Claire to change 
the result, since either sı = s or s2 = s} must be satisfied, but not both. The 
protocol in Fig. 4 satisfies the following theorem. 


Theorem 2. Assume that H in Fig. 3 is a secure PRF. The oblivious DFA 
evaluation protocol in Fig. 4 is a sound and private MVC as per Definition 3. 
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Oblivious DFA evaluation protocol via CDS 
Input: 


— Alice has a DFA M = (Q, X, A, qo, F). 
— Bob has an input string £ = To£1 `- @n-1. 
— Claire has no input. 


Algorithm: 


1. 6 = (81, 82,71,72) — Setup : Alice and Bob get common secrets (s1, 52) and 
common randomness (r1,72) from a coin-tossing protocol or a public source 
of randomness. These values are not known to Claire. 

2. (Qi, Ti) — Input(a:,6,1*) : Alice and Bob execute the oblivious CDS scheme 
(Figure 3) for the DFA M and the input string x using the secret sı and the 
randomness rı. They also execute the oblivious CDS scheme (Figure 3) for the 
DFA M = (Q, X, A, qo, Q — F) and the input string x using the secret s2 and 
the randomness r2. Here, aj and a2 are CDS messages, and 71 = T2 = (81, 82). 

3. (81, 82) — Compute(f, a1, a3) : Claire computes the output values si and s 
from both CDS schemes. Here, 31 = G2 = (s1, 5%). 

4. yU{1L} — Verify(8i, Ti) : If sı = s4, both Alice and Bob output “M(a) = 1”. 
If s2 = 85, both of them output “M(x) = 0”. Otherwise, output L. 


Fig. 4. Oblivious DFA evaluation protocol via CDS 


Application. It is not difficult to see how this protocol can be used for DNA 
matching and other applications. In this case, Alice holds a pattern modeled 
with a DFA”, and Bob holds a DNA sequence. Firstly, they generate common 
secrets and randomness. Next, Alice generates the garbled transition matrices 
of the DFA and its complement, while Bob transforms the DNA sequence into 
garbled character keys, according to the oblivious CDS scheme for DFA. After 
the oblivious CDS schemes for DFA are executed, Alice and Bob conclude their 
result based on Claire’s outputs. 


4.1 Complexity Analysis 


We now briefly analyze round complexity, communication complexity, and com- 
putational complexity of our protocol in Fig. 4. See Table 1 for more details. 


Round Complexity. In Fig. 4, two CDS can be executed in parallel. Each of 
Alice and Bob then send one message to Claire, and Claire sends the results 
back. The total number of rounds is 2, which is a constant. Our protocol can 
also be considered as non-interactive. 


? The method in [38] can be used to transform a pattern p to a finite automaton 
LEVa(p) accepts the language La(p) contains all strings with Levenshtein distance at 
most d from p. It is shown in [39] that a finite automaton for a language X* La(p)X* 
will not have too many states. 
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Table 3. Numbers of operations for oblivious DFA evaluation protocols 


Protocol Automaton Holder String Holder Outsourcing Server 
Asym. Comp. Sym. Comp. Asym. Comp. Sym. Comp. Asym. Comp. Sym. Comp. 

Troncoso et al. [39] Ol\e||QIZI)— O(|x||21 + |Q) O(|2||Q|) O(|2|) 

Frikken [14] O(\2|) O(\2||QI|2) O(\2|) O(|x|) 

Gennaro et al. [18] O(\Q|(\a| + ||) - O(|x||Q|) - 

Mohassel et al. [35] O(\a|) O(|x|(|Q| + |X|) O(|z|) O(|z|) 

Laud& Willemson [27] - O(\x||Q||2|) - O(|z|) 

Crescenzo et al. [13] O(\x||Q|| Z|) O(\Q|) O(\x|Q|| 2) = 

Zhao et al. [42] O(|æl) o(|z|lQ]) O(\2|) O(|x|) z $ 

Blanton&Aliasgari [8] - = = < O(|x|) O(\x||Q|| 21) 

Wei&Reitor [40] O(\2||Q|| 21) > o(lall SR] + |21)) = i K 

Ours O(\z||QI|2)) E = s O(|2|) 


- : Negligible compare to other operations 


Table 4. Numbers of operations for protocols with outsourcing servers 


Protocol Automaton Holder Outsourcing Server 
Blanton&Aliasgari [8] - 1-out-of-|X)|-OT x2 
L-out-of-|Q||27|-OT x2|z| 
1-out-of-|Q|-OT x2 
Ours PRF x2|2||Q|| | PRF x2|z| 


Communication Complexity. The largest part of the communication is the gar- 
bled transition matrices. Thus, the communication complexity is O(|æ||Q|| X|). 


Computational Complexity. We trade-off a usage of asymmetric-key oper- 
ations with one outsourcing server. Hence, the protocol can be more efficient 
compared to the previous works with heavy usage of homomorphic encryption 
and oblivious transfer (OT). 

We briefly compare numbers of operations of the oblivious DFA evaluation 
protocols in Table3. Some of the works have at least O(|x||Q|| X|) asymmet- 
ric computation, and some have a bit lower asymmetric computation as at 
least O(|z|). Ours requires zero asymmetric computation. Although [27] also 
requires zero asymmetric computation (depend on the building block), it requires 
O(|Q||7|) rounds (see Table 1), while ours has constant rounds. 

We compare protocols with outsourcing servers in Table 4. Blanton et al. [8] 
uses 2|z| applications of 1-out-of-|Q||’|-OT, while our protocol uses 2|x||Q||2'| 
applications of PRF. Since OT typically uses public-key operations such as mod- 
ular exponentiations (e.g., [10,16]), while PRF can be based on symmetric ones 
such as block ciphers or keyed one-way hash functions (e.g., [12,20]) which are 
several orders of magnitude more efficient than public-key operations [36], this 
suggests that ours should be fundamentally faster than [8]. In more details, 
according to the state-of-the-art schemes for fast 1-out-of-n OT in [10, 16,33, 34], 
running m applications of 1-out-of-n OT requires O(m) modular exponentiations 
and O(nm) overall time. This suggests that [8] requires O(|a|) asymmetric-key 
operations and O(|x||Q|| X|) symmetric operations. On the other hand, ours uses 
2|z||Q|| X| applications of PRF. 
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We note also that even looking at less-dominant computation part in OT 
(besides its public-key operations), running one OT application itself typically 
already requires more than one applications of PRF. Theoretically justified by 
[25], OT is an expensive operation compared to the evaluation of a PRF or a 
PRG. The OT protocol in [36] also uses a PRF as a building block. This also 
confirms that ours protocol should be fundamentally faster than [8]. 


5 CDS for DFA 


As an independent interest, we propose a standard CDS for DFA in this section. 
In the setting of a standard CDS scheme for DFA, Alice has a DFA M, a secret 
s, and a randomness r, while Bob has an input string x, the same secret s, and 
the same randomness r. Claire knows both M and z, but does not know s or r. 
Claire can learn s if and only if M(x) = 1, and learn nothing else if M(x) = 0. 
As an overview of our construction, we transform the automaton and the input 
string into monotone span programs using a method from [2], and then construct 
a CDS scheme for those span programs. The transformations are applied to both 
automaton and input string in order to polynomially bound the size of the span 
program. Note that the standard CDS for DFA in this section does not require a 
PRF.’ Complexity analysis of our standard CDS for DFA is in the full version. 


5.1 Transform an Input String to a MSP 


The following transformation from an input string to a MSP is from [2]. We 
extend it in order to support any size of alphabets. An input string x is trans- 
formed to a MSP (Lz, px), and a DFA M is transformed to a set of attributes 
Sm. A universe of attributes for DFA is denoted as 


Um = {(0,1,9) : i,j E€ [mar]; o E V}U {Size = i” : i € [Qmax] } U {*Dummy” } 


where Qmazx is a maximum number of states that all parties agree on. Each 
attribute can be represented by an integer using the following mapping. 


“Dummy” =œ 0, (0,i, j) = 2S +j)? +7) +20, “Size =i” = 2i+1 
A DFA M = (Q, X, A, qo, F) is transformed into a set of attributes 
Sm = {“Dummy”} U {(0,i, j) € X x Q? : j = A(o,i)} U {Size = |Q” }. 


To transform an input string £ = £oT1 . . . £n—1 with length |z| = n to a MSP, 
we define the following matrices as in [2]. 


— I,, denotes n x n identity matrix. 
— g,, and On denote column vectors (1,...,1)' and (0,...,0)' of size n. 
— Omxn denotes a zero matrix of size m x n. 


3 In practice, PRF can be used to reduce the size of the common randomness. 
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Table 5. Submatrix | V n|| Wn] associated with o (refer to Table 4 in [2]) and a MSP 
(Lz, Px) from an input string x (refer to Table 5 in [2]) 


“Dummy” > 1 }-10...0] 0...0 | 0...0 |]... | 0...0] 0...0 
y 
(0,1,1) = 
He = (z0) wae 
hr 

In On On In zo & [On|  [Vn|| Wn] On2 xn On2xn | On2xn 
(o, l,n) => 
(0,2,1) = 

On mo å > =h tr 10,2) Onzxn | [Val] Wa] | > | On2xn | On2xn 
(0,2,n) => 
(nl) a 

On On = Da -hn En- & |On) On2xn | On2xn | On2xn e | [Val] Wn] 
(o,n,n) = 


Jn On --: On 
On Gn +: On 2 


- V,z,=1n89,= of size n* x n. 


- Wn = -g9„ Q In = [-In]|---|| —In]" of size n? x n. 
- For each o € X, define [V,|| W,]% associated with o as shown in Table 5 
(left). Each row is corresponding to (ø, i, j) for all i, j € [n]. 
The MSP matrix L, with labeling function p, is shown in Table 5 (right). 
We refer to the following theorem proved in [2]. 


Theorem 3. Using the method above, let (Ly, pz) be a MSP constructed from 
an input string x, and Sm be a set of attributes constructed from a DFA M. We 
have (Ly, px) accepts Sm if and only if M(x) = 1 and |Q| <n. 


5.2 Transform a DFA to a MSP 


Similar to the previous subsection, we also transform a DFA to a MSP. We 
refer to the transformation from a DFA to a MSP from [2] with an extension 
to support any size of alphabets. A universe of attributes for input strings is 
denoted as 


Uz, = {(t,0) : i € [0, nmas], o E L}U {*Length = 2” : i € [Nmaz]} U {“Dummy” } 


where Nmaz is a maximum length of input string that all parties agree on. Each 
attribute can be represented by an integer using the following mapping. 


“Dummy” =œ 0, (io)=> (|X| +1)i+0o +1, “Length =i” & (|X| +1)(i+ 1) 


An input string £x = £o£1 . ..Zn—1 is transformed into a set of attributes 


Sz = {“Dummy”} U {(i, x:) : i € [0, n — 1]} U {“Length = n” }. 
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Table 6. A MSP (Lm, par) from a DFA M (refer to Table 2 in [2]) 


“Dummy” = | 1 [-10...0) 0...0 | 0...0 |... | 0...0 | 0...0 
Hn, — HY 0 
to =O" > O| Tai YO [Oke > |Oaixial | Olgixig] 
tin 48 i 
to= 1" > OQ] Ta Y® | Ogg | °° | aixiai | lexia] 
“eo =|5]-1” => [O| Je | YUP Olge = | Baixial | Oxie 
“a1 =0" = |0| Oaixjai | Ao Y% |-  Wigixiai | OlgixieI 
“sg =|2|-1” Ojai | Ojaixiay | Oixe | Ogg e | g | YUP 
“Length=1” = 0 | 0...0 | 0...01 
“Length = 2” + 0 0...0 | 0...01 
“Length = |Q = | 0 0...0 | 0...01 


To transform a DFA M = (Q, X, A,qo, F) to a MSP matrix, we define a 
submatrix Y (°) for each o € X with size |Q| x |Q| in the same way as [2]. The 
cell of Y 7) at position (i, 7) is —1 if j = A(i,c), and is 0 otherwise. The MSP 
matrix Lm with a labeling function pm is shown in Table6. We refer to the 
following theorem proved in [2]. 


Theorem 4. Using the method above, let (Lm, pm) be a MSP constructed from 
a DFA M, and S, be a set of attributes constructed from an input string x. We 
have (Lm, pm) accepts Sz if and only if M(x) = 1 and n < |Q]. 


5.3 CDS for MSP 


A CDS scheme for MSP is proposed in Fig. 5. In this setting, Alice has a MSP 
(L, p), a secret s, and a randomness r. Bob has a set of attributes S, the same 
secret s, and the same randomness r. Claire has only (L, p) and S. The idea is 
that Alice performs a dot product between the MSP and a secret vector, masks 
the results with random values, and then sends to Claire. At the same time, 
Bob sends the random values corresponding to the set of attributes. If the set of 
attributes satisfies the MSP, masked random values can be cancelled out, and 
the secret can be recovered. The method can be considered as a linear secret 
sharing based on MSP. 

From the scheme, Claire will only learn the values from rows corresponding 
to the set of attributes. Correctness and security then follow from linear secret 
sharing of MSP. We have the following theorem. 
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CDS scheme for MSP 
Input: 


— Alice has a MSP (L € m p), a secret s, and a randomness r. 
— Bob has a set of attributes S = {u1,..., Un }, a secret s, and a randomness r. 
— Claire has the MSP (L, p), and the set of attributes S. 


Algorithm: 


1. Alice randomly generates an m-element vector v = (v1, ..., Um) such that the 
first element is vı = s. 

2. For each i € [£], let L; (the i-th row of L) be the j-th row that is associated 
with the attribute u. Alice calculates w; = LDj-v+ru,j where ru,j is extracted 
from the randomness r. Alice then sends {wi;}icjq to Claire. 

3. For each u € S, Bob extracts {ru,j}je[¢ma,] from the randomness r in the 
same way as Alice, where maxz is the maximum number of rows that can be 
associated to an attribute. Bob then sends {ru,j}wes,je[lmaz] tO Claire. 

4. For each L; that can map to an attribute in S, Claire computes (L; - v) 
from wi — ru. If (L,p) accepts S, then there exists a vector w such that 
w- Ls = (1,0,...,0). Claire can calculate s from w: (Lg - v). 


Fig. 5. CDS scheme for MSP 


Theorem 5. The CDS scheme for MSP proposed in Fig. 5 is correct and secure. 
That is when (L, p) accepts S, we have Pr[|Dec((L, p), S, Enca((L, p), 5,1), 
Ence(S,s,r)) = s] = 1. And when (L,p) does not accept S, there exists a simu- 
lator Sim such that {Sim((L, p), S)} ~ {Enca((L, p), s, r), Encg(S, s,r)}. 


5.4 CDS for DFA 


We are now ready to use the building blocks from previous subsections to con- 
struct a CDS scheme for DFA. The scheme is shown in Fig.6. Alice and Bob 
first transform a DFA and an input string into MSPs and sets of attributes. 
After that, they execute two CDS schemes for MSP in parallel. If the automaton 
accepts the input string, Claire will learn the secret of the CDS scheme. Cor- 
rectness and security follow from the schemes in previous subsections. We have 
the following theorem. 


Theorem 6. The CDS scheme for DFA proposed in Fig.6 is correct and 
secure. That is when a DFA M accepts an input string x, it holds 
that Pr[Dec(M, x, Enca(M,s,r),Ence(2z,s,r)) = s| = 1. And when M 
does not accept x, there exists a simulator Sim such that {Sim(M,2x)} ~ 
{Enca(M, s,r), Ence (x, s,r)}. 
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CDS scheme for DFA 
Input: 


— Alice has a DFA M = (Q, X, A, qo, F), a secret s, and randomness (r1, r2). 
— Bob has an input string £ = £021 . . . &n—1, a secret s, and randomness (r1, r2). 
— Claire has the finite automaton M, and the input string z. 


Algorithm: 


1. Alice generates (Zar, pm) and Sm from M. 

2. Bob generates (Ls, px) and S; from z. 

3. Alice and Bob execute two CDS schemes with inputs ((Lar, paz), Sz, 8,71) and 
inputs ((Lz, px), Siz, 8,72) at the same time. 

4. If M accepts x, then Claire can recover the secret s. 


Fig. 6. CDS scheme for DFA 


6 Concluding Remarks 


In this paper, we propose an oblivious CDS scheme for DFA. Then we use it 
as a building block to construct an oblivious DFA evaluation protocol. We also 
propose a standard CDS scheme for DFA as an independent interest. 

Some of the previous works considered oblivious CDS schemes for NFA, 
including [27,37]. We believe that our work could be extended for those situ- 
ations. For the works considered NFA, although there exists an algorithm to 
convert a NFA into a DFA, the number of states in the result DFA can be 
exponentially large. Thus, these schemes can have an advantage in this case. 

At this point, we do not know how to construct an oblivious (or even a 
standard) CDS scheme for NFA without using asymmetric-key operations. Using 
the method in Sect. 3 can leak the structure of the NFA to Claire. This is because 
at each step, Claire will have more than one state keys, and may know which 
states have transitions to the same state. Extending the method from [2] to the 
NFA setting is also not trivial. We left this problem as future work. 

In addition, it is interesting to extend our protocol to a setting with malicious 
Alice and Bob. It is also interesting to try constructing oblivious evaluation 
protocols for pushdown automata and Turing machine. The difficulty is how to 
keep track of the memory obliviously. Moreover, we would like to try constructing 
oblivious protocols via CDS for Moore machine and Mealy machine as well. 
Although, we can embed the output information into the garbled transition 
matrix in the same way as [14], the output from this revised protocol will not 
be verifiable and not be secure against malicious Claire anymore. 
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Abstract. We propose an efficient oblivious transfer in the random ora- 
cle model based on public key encryption with pseudorandom public 
keys. The construction is as efficient as the state of art though it has a 
significant advantage. It has a tight security reduction to the multi-user 
security of the underlying public key encryption. In previous construc- 
tions, the security reduction has a multiplicative loss that amounts in 
at least the amount of adversarial random oracle queries. When consid- 
ering this loss for a secure parameter choice, the underlying public key 
encryption or elliptic curve would require a significantly higher security 
level which would decrease the overall efficiency. 

Our OT construction can be instantiated from a wide range of assump- 
tions such as DDH, LWE, or codes based assumptions as well as many 
public key encryption schemes such as the NIST PQC finalists. Since 
tight multi-user security is a very natural requirement which many pub- 
lic key encryption schemes suffice, many public key encryption schemes 
can be straightforwardly plugged in our construction without the need 
of reevaluating or adapting any parameter choices. 


1 Introduction 


An oblivious transfer (OT) [Rab81,EGL82] is an interactive protocol between 
two parties called a sender and a receiver. At the end of the protocol, the sender 
outputs two messages Mo, mı while the receiver outputs b, mẹ for a choice bit 
b. Security requires that the sender does not learn b and the receiver does not 
learn m4_». OT is a fundamental building block in cryptography [Kil88], partic- 
ularly in secure multi-party computation (MPC) [Yao82, Yao86,CvT95,IPS08, 
IKO+11, BL18,GS18], which allows mutually distrusting parties to securely per- 
form joint computations on their privately held data. MPC has a plethora of 
applications in practice, for example, in securely training machine learning mod- 
els (e.g. [MR18]), private set intersection (e.g. [KKRT16,PRTY20]) etc. In fact, 
a significant body of practically efficient MPC protocols do rely primarily on the 
primitive of OT (e.g. [NNOB12,KOS16]), which makes efficient secure OT an 
important and very natural objective. 


Part of the work was done while the authors were at Visa Research. 
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Within the last years, there has been significant progress in making OT 
more efficient. Chou and Orlandi [CO15] proposed a very efficient OT in the 
random oracle model [BR93,CGH98] based on the DDH assumption. It turned 
out, that it does not achieve UC security [GIR17,HL17], but only stand-alone 
security. Masny and Rindal [MR19] proposed an OT from public key encryption 
(PKE) with pseudorandom public keys that is as well very efficient but also 
UC secure and can be instantiated from a variety of assumptions such as LWE 
or code based assumptions. The construction makes it very easy to plug in 
PKE schemes such as the NIST PQC candidates [SAB+20, DKR+20,CDH-+ 20, 
ABC+20] which is a significant advantage over more tailored construction of OT 
based on DDH [CSW 20], LWE [PVW08, BD18, BDK+20] or McEliece [DvMNO8, 
DNM12]. McQuoid, Rosulek and Roy [MRR20,MRR21] gave a more modular 
analysis of this approach, extended it to PKEs with pseudorandom ciphertexts 
(PKE B) as well as increased the efficiency when multiple OTs are run in parallel. 
Masny and Watson [MW21] increased the efficiency by leveraging a PKI. 

This approach works as follows. Using a specific query pattern to a random 
oracle, a receiver can freely chose one public key while a second public key will 
be completely determined by the random oracle. At the same time, a sender 
can reproduce the same queries and public keys and then encrypt one OT string 
under each of the public keys. Though he will not be able to determine which of 
the keys has been freely chosen by the receiver. At the same time, the receiver 
can only recover the string under the freely chosen public key but not the other. 
Unfortunately, this approach has some drawbacks, namely the receiver could 
repeat the query pattern to the random oracle until he finds a public keys that 
might be easier to break than the average public key and then try to recover 
both strings. Typically, a PKE is hard to break for a random public key with 
overwhelming probability and therefore it should not cause an issue. Neverthe- 
less, it limits how tightly one can prove the security of the OT protocol based 
on a PKE scheme. 

This drawback can be resolved by using a PKE that is tightly 
secure in the multi-user setting. Tight multi user security has received 
significant attention in the context of key exchange, PKE and sig- 
natures [Has88, BBM00, HJ12, Zav12, BHJ+15, KMP16, CKMS16, GKP18, GJ18, 
PR20, LLGW20, JKRS21]. Bellare, Boldyreva and Micali [BBM00] showed that 
ElGamal is tightly secure even when multiple challenge ciphertexts are given 
to the adversary. There are numerous works that focus on tight multi-user 
secure PKE [Has88, HJ12,Zav12, CKMS16,GKP18]. The tightness requirement 
does not put significant restrictions on known PKEs. Tight multi-user security 
is a very natural property that a PKE should typically have since usually the 
security of all users and not just of a single user needs to be considered. Non- 
tightness would demand an increase in the bit security level of a PKE when used 
across many users which would render the PKE significantly less efficient. 

Unfortunately, using a tightly secure PKE in the multi user setting is not 
sufficient. The security analysis of [MR19, MRR20,MRR21] also involves repro- 
gramming the random oracle and guessing which query a malicious receiver will 
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later use during the OT protocol. This comes at the cost of a security loss which 
is multiplicative in the amount of adversarial random oracle queries. This issue 
seems to requires a more in-depth analysis of this approach of constructing OT 
and opens the question whether a similar construction could achieve tight secu- 
rity. In this paper, we answer the following question: 


Can we construct efficient OT that is tightly secure in the ROM from 
public-key encryption? 


1.1 Our Contribution 


We propose a new construction of OT in the random oracle model which can 
be proven tightly secure based on the multi-user security of the underlying 
PKE. This approach follows the paradigm of Masny and Rindal [MR19, MRR20, 
MRR21,MW21] by specifying a pattern of random oracle queries which allows 
a malicious receiver to choose one public key freely while a second one is deter- 
mined by the random oracle. 

We use a mild notion of multi-user security which is weaker than the notion 
proposed in previous literature such as [BBM00]. In our notion, we require that 
an adversary receives n user public keys and then decides for which he wants to 
see a challenge ciphertext. The notion of [BBM00] allows an adversary to see chal- 
lenge ciphertext for all of the public keys. Nevertheless, there are many PKEs 
that even achieve the stronger notion of [BBM00] with a tight security proof 
under the DDH or LWE [Reg05] assumption. We recap the most basic PKEs and 
their tight reductions to DDH and LWE in Sect. 3. The results extend straightfor- 
wardly to code based schemes, the ring or module LWE [LPR10,BGV12,LS15] 
setting or elliptic curves. 

For our OT, we require a second property that is the pseudorandomness of 
the public keys. This requirement is the same as in [MR19] we the exception 
that it holds tightly based on the underlying assumption even when n keys are 
seen. We recap this property as well in Sect. 3 for the PKEs of interest. 

In Fig. 1, we compare our result with previous works. Since the main differ- 
ence of our construction to [MR19] is how the random oracle is used, the effi- 
ciency of our OT is very similar to [MR19]. On one hand, we need to compute 
3 additional hash evaluations. The hash evaluations are standard evaluations 
mapping onto {0,1}* and when using elliptic curves, not to curve points. On 
the other hand, we are actually, similar to [MRR20] able to reduce the com- 
munication complexity on the receivers side from 2|pk| ([MRR20]) to |pk + 2A. 
In particular when instantiating the OT with lattice or code based schemes 
[SAB+20, DKR+20,CDH+20, ABC+20] which have rather long keys, this is a 
significant reduction. Even when instantiating the OT with ElGamal encryption, 
we need to sample one random group element less which requires an exponen- 
tiation. In the elliptic curve setting, our construction is compatible with the 
performance optimizations of [MRR21] and would therefore be competitive with 
the currently fastest implementations of UC OT reported in [MRR21]. Further, 
our OT is based on a PKE with pseudorandom public keys (PKE A) which, 
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UC | Loss Model _|Com(R) |Com(S) 
CO15] x - ROM log |G| | log |G| 
MR19) |PKE Al O(q)| ROM 2jpk| | 2jcti 
CSW20] | DDH [O(q?)| ROM,CRS | 2log [GJ | log [G] 
MRR20] |PKE B/O(q?)| ROM jet] +A | [pk] 
MRR20|"|PKE A] O(q)| ROM | |ct|-+ | [pki 
MRR21] |PKE B| O(q) |Ideal Cipher] |ct| +A | |pk| 
MRR21]"|PKE A| O(1) |Ideal Cipher] |ct| +A | |pk| 
Ours PKE A| O(1) ROM  |[pk|+2A| 2Ict| 


Fig. 1. We compare our construction with previous works. The depicted loss assumes 
tight multi-user security of the underlying PKE. We emphasize that the listed works 
realize different OT functionalities and therefore the comparison between the communi- 
cation should be interpreted with caution. PKE A stands for PKE with pseudorandom 
public keys and PKE B stands for PKE with uniform ciphertexts. q is the amount of 
adversarial random oracle queries (hash evaluations). [MRR20]*, [MRR21]* are slight 
adaptations of the original works to make them compatible with the PKE A setting. 


unlike PKEs with uniform ciphertexts (PKE B), can be efficiently instantiated 
with post-quantum PKEs, e.g. from codes or lattices. We could also use our tech- 
niques to construct an OT from a PKE with pseudorandom ciphertexts (PKE 
B), though it is unclear whether the tightness would still hold and it might 
require stronger assumptions such as the interactive DDH assumption [MR19] 
or oracle assumptions [BCJ+19, MRR21]. 

As shown in Fig. 1, our OT is currently the only OT among the most efficient 
OTs that is tightly secure in the random oracle model. The main challenge is 
typically security against a malicious receiver. Previous works suffer at least a 
loss of O(q) where q is the amount of adversarial hash evaluations. For a conser- 
vative parameter choice, previous works need to start with a significantly higher 
security level of the PKE or elliptic curve which negatively impacts efficiency 
and communication complexity or alternatively, use a stronger model such as 
the ideal cipher model as in case of [MRR21]*. 

We emphasize that Fig. 1 states the loss for [MRR20]* and [MRR21]* when 
using a PKE with tight multi user security. Using (plain) single user secure PKE, 
the loss increases by a factor of g. This also holds for the loss stated for [MR19]. 


1.2 Technical Overview 


We follow an approach by Masny and Rindal [MR19]. They construct a two 
round OT in which the receiver starts by sending a message 19,71 from this 
message the sender can derive two public keys under which he encrypts the two 
OT strings. The public keys are pky := rı + H(ro) and pk, := ro + H(r1). When 
following this approach, proving security against a malicious sender is typically 
easy since the random oracle can be programmed such that the simulator knows 
the secret keys for both public keys which can then be used to extract the 
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malicious sender’s string. The more challenging part is to prove security against 
a malicious receiver R*. Given that R* makes only two random oracle queries, 
ro and rı, the simulator can observe the first query, let it be rẹ. Then, b is 
the extracted choice bit. Further, when the second query is made, the simulator 
could pick a public key pk* of its choice and program the oracle H such that 
H(ri_p) := pk” — rp and thus pk,_, = pk”. If R* learns information about the 
OT string s;_», he would then break the security of the PKE. 

Unfortunately, when the malicious receiver makes many queries, it is not 
clear how to program H(ri—») since any of the q previous queries 71,..., 7g 
could be the ry query. This would lead to the potential public keys pk;_,) = 
7 +H(ri_o),---,Pky_o,q = Tq +H(r1-»). We could guess j € [q] such that ry = 7; 
but this would cause a loss of q. 

Before explaining our construction, we first take an intermediate step. The 
MR OT has similarities with a sequential OR proof [RST01, AOS02]. Instead we 
could follow the parallel OR proof paradigm [CDS94]. The public keys would be 
then derived from a message r, co, cı and defined as pkg := r + H(co) and pk, = 
r+H(cz). This construction has similarities with the McQuoid, Rosulek and Roy 
OT [MRR20]. As an additional constraint, we ask that A(r) = co +c, where H is 
a second random oracle. When proving security against R*, whenever R* makes 
a query to H, the simulator samples a random é and programs H(é+c;) = pk; =r 
for any previous query c; to H for a public key of its choice. Since ĉis uniform, 
it is very unlikely that H has been programmed on this input for a previous 
query. Now we could just rely on the multi-user security of the PKE rather than 
trying to guess which of the previous queries corresponds to rp. Nevertheless, R* 
could first query H for r and then query H for co,c; such that H(r) = co + c1. 
This would cause an issue in the programming strategy which assumes that the 
adversary queries first cg or cı to H. Further, this strategy does not seem to help 
R* since by using a guessing strategy, we could show that by the security of the 
PKE, R* cannot learn any of the OT strings. However, it seems that we cannot 
show this via a tight reduction. 

We resolve the issue via the following approach. We let the receiver send 
(r, co, ¢1) and the public keys are defined as pk, := r+H(éo) and pk, := r+H(é1), 
where ĉo := c1 + H(r, co) and ¢1 := co + H(r, c1). ĉo and ¢, could be seen as the 
ro,7T1 values of the MR OT. But rather than using them directly, we apply an 
additional random oracle on them as “correlation breaker”. A PKE scheme is 
typically not tightly secure in a setting where an adversary A can first suggest 
q shifts 7,,...,7 , then receives public key pk and finally tries to break IND- 
CPA security under public key pk — 7; where j € [q] is chosen by A. Though 
a correlation robust hash function H [IKNP03] is tailored to such a setting and 
maps all inputs pk — r1, ..., pk — rq to strings that do not collide as long as pk is 
uniform and independent of 71,...7g. In our setting, we need something stronger 
than correlation robustness since we also need programmability such that we can 
program these disjunct strings to different public keys. Fortunately, a random 
oracle provides both properties such that for any choice of r,co,c; among the 
random oracle queries of R*, at least one of the public keys pkg and pk, will 
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correspond to a programmed key chosen by the simulator. When q is the total 
amount of random oracle queries, there are at most q? choices for r, co, c1 among 
the queries. This is due to the fact, that for any b € {0,1}, cy is uniquely defined 
by r and ci_y. Therefore, there will be at most q? choices of public keys pko, 
pk, and hence the multi-user security of PKE for q? user is sufficient to prove 
security against a malicious receiver. 

For the proof, it would sufficient to just hash ro, rı of the MR OT, though 
in the actual protocol, we need to allow the receiver to control one of the public 
keys. For this reason we introduce r to the protocol. Interestingly, our protocol 
could be seen as a combination of sequential and parallel OR proof techniques. 


2 Preliminaries 


Notation. For n € N, we use [n] to denote the set {1,...,n}. We use to denote 
the security parameter. And x — ¥, x — X to sample x from a distribution Y 
or uniformly random from a set X. 

Let I be a protocol between two parties S and R. For two (interactive) 
algorithms S’, R’ that do not necessarily follow the protocol description of JZ, 
we use [S’,R’]7 to denote the interaction between S’ and R’ in protocol I, 
where S’ takes the role of S and R’ the role of R. For an environment D, we use 
D([S’, R’]m) to denote an interaction of D with S’,R’ who interact in JI. Here, 
we follow the simple UC framework of [CCL15]. 

For a cyclic group G of order p € N with generator g, we use [1] to denote g 
and for a,b € N, [a] + d[1] = [a + b]. For a,b € Z}, we use (a,b) to denote the 
inner product between a and b. For an oracle O and an algorithm A, we use A? 
to denote A when A has query access to O. 


Cryptographic Assumptions 

We recap the DDH and LWE problems below. Since we consider the UC 
setting, we need to consider non-uniform algorithms which receive an auxiliary 
input. 


Definition 1 (Decisional Diffie-Hellman (DDH)). A ppt algorithm A 
solves the decisional Diffie-Hellman (DDH) problem for a group G of order p € N 
with generator [1] with probability € if for any polynomial auxiliary input z, 


| Pr[A(z, [1], [a], [b], [2>)) = 1) — Pr[A(z, [2], [a], [b], fel) = 1| > €, 
where a,b,c — Zp. 


Definition 2 (Learning with Errors (LWE)). A ppt algorithm A solves the 
Learning with Errors (LWE) problem for parameters q,n € N and noise distri- 
bution X with probability e if for any polynomial auxiliary input z 


| Pr[ACe(z) = 1] — Pr[AP¥(z) = 1]| > e, 


where Owe is a oracle that outputs samples of the form a, (a,s)+e with a — Z}, 
e — X and each sample uses the same secret s — Z}. Oy is the oracle that 


outputs a,u with a — Z}, u — Z4. 
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Public Key Encryption. We define public key encryption and its multi-user 
security below. We emphasize that we consider a setting with only a single 
challenge ciphertext which is a weaker security notion than the commonly used 
multi-user security setting in which an adversary receives a challenge ciphertext 
for each public key. 


Definition 3 (Public Key Encryption). A public key encryption (PKE) is 
a triplet of algorithms (Gen, Enc, Dec) and a message space M with the following 
syntax. 


Gen: Takes as input 1ò and outputs a key pair (sk, pk). 
Enc: Takes as input pk and a message m E€ M and outputs a ciphertext ct. 
Dec: Takes as input sk and a ciphertext ct and outputs a message m. 


We require correctness and M-IND-CPA security. 


Correctness: For any m € M 
Pr[Dec(sk, Enc(pk, m)) = m] > 1 — negl, 


where (sk, pk)  Gen(1?). 
n-Multi-User IND-CPA (M-IND-CPA): For any ppt adversary A := (Ai, A2) 


and any polynomial auxiliary input z 
| Pr[A2(st, ctġ) = 1] — Pr[A2(st, ct?) = 1]| < negl, 


where for all i € fn], (sk; pk;) = Gen(1), (st,i*, Mmo, mı) <— 
Aı(z,pkı,..., pkn) and for all b € {0,1} cty — Enc(pk;», m»). 


In addition to the multi-user IND-CPA security, we also need that public 
keys are indistinguishable from uniform in the multi-user setting. 


Definition 4 (PKE with Pseudorandom Public Keys). For n € N, we 
call a PKE scheme n-multi-user public key indistinguishable (M-IND-PK) over 
group G if for any ppt A and polynomial auxiliary input z 


| Pr[A(z, pki, seg pk,, ) = 1] _ Pr[A(z, Ul... , Un) = 1]| < negl, 
where for all i € [n], (ski, pk;) — Gen(1*) and u; — G. 


Oblivious Transfer. We use the simplified UC framework which is sufficient 
for full UC [CCL15]. Below, we define UC secure OT. 


Definition 5 (Ideal Oblivious Transfer Functionality). An ideal OT func- 
tionality For interacts with two ppt parties S and R as follows. For takes so, s1 
from S. For takes b from R and returns sp. 


Definition 6 (Oblivious Transfer). We call a protocol IT between two ppt 
parties, a sender S and a receiver R, oblivious transfer (OT) if at the end of 
the protocol they have established a correlation in which S holds strings (so, 81) 
and R holds (b, sy). For security, we require two properties with respect to a 
functionality For. 
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Security Against a Malicious Sender: For any ppt adversary A, there exists 
a ppt adversary A’ such that for any ppt environment D and any polynomial 
size auxiliary input z 


| Pr[D(z, [A, RJ) = 1] — Pr[D(z, [A’, For]7) = I] = negl, 


where all algorithms receive input 1». R additionally receives input b. 

Security Against a Malicious Receiver: For any ppt adversary A, there 
exists a ppt adversary A’ such that for any ppt environment D and any poly- 
nomial size auxiliary input z 


| Pr[D(z, [S, Alzr) = 1] — Pr[D(z, |For, A’Jz) = 1]| = negl, 


where all algorithms receive input 1>. 


3 Public Key Encryption in the Multi User Setting 


We use this section to recap commonly known public key encryption schemes that 
are tightly secure in the multi-user setting. As a proof of concept, we consider 
ElGamal, Regev encryption and dual Regev encryption. 


Definition 7 (ElGamal). The ElGamal PKE over group G with order p € N 
and generator [1] with message space M := G has the following syntac. 


Gen([1]) — (pk, sk): Sample x — Zp and output pk := [a] and sk := x. 
Enc([1], pk, m) — (cti, ct2): Sample r — Zp and output ct, := [r], cto := rpk+m. 
Dec([1], sk, ct) — m: Output m := ctz — sk- cty. 


It is straightforward to see that ElGamal is perfectly correct. Let us recap 
that it is tightly secure in the multi-user setting. Due to the fact that the public 
keys are uniform over G, ElGamal is perfectly n-M-IND-PK secure. 


Lemma 1. Let G be of prime order and DDH be « hard over G and n polynomial, 
then ElGamal over G is 2e n-M-IND-CPA secure. 


Proof. The proof follows straightforwardly from the random selfreducibility of 
the DDH assumption. The reduction for parameter d € {0,1} receives a DDH 
challenge [a], [b], [c] and samples for all 7 € [n] ri — Zp. It forwards z and 
pk, :=rila],...,pk,, := rna] to A that tries to break ElGamal. When A send 
i*, mo, m1, the reduction sends ct := ([b], rilec] - ma). The reduction outputs the 
output of A. 

When [c] = [ab], ct is an encryption of mg, i.e. ct := cta, while when c is 
uniform, ct encrypts a uniform message, i.e. ct := cty. If A distinguishes ctg from 
cty with probability ¢’, the reduction solves DDH with probability €e’. Assuming 
that DDH is e hard, A cannot distinguish ctą from cty with ¢«’ > e for any 
d € {0,1} and it cannot distinguish cto from ct, with e’ > 2e. 


634 S. Badrinarayanan et al. 


Definition 8 (Regev Encryption [Reg05]). Regev encryption with the param- 
eters q,n,m € N with m > nlogq and message space {0,1} has the following 
syntax. 


Gen(1ò) — (pk,sk): Sample s — Zl, A= Z7", e — X™ and output pk := 
(A, As + e) and sk := s. 

Enc(pk,m) — (ct1,ct2): Sample R — {0,1}™*™ and output ctı := Rpk,, cto = 
Rpk, + m| 4]. 

Dec(sk, ct) > m: Compute M := ctz — ct - sk and output m := ||2m]|. 


For a proper choice of q,m and X, Regev encryption will be correct. 


Lemma 2. Let LWE be e hard and n polynomial, then Regev encryption is 2e 
n-M-IND-CPA and € n-M-IND-PK secure. 


Proof. We first show M-IND-CPA security. The reduction for parameter d € 
{0,1} receives access to an oracle O that it uses to generate A;, bi for all i € [n]. 
It sets pk; := (Aj, bi + Aisi) for si — Z} and forwards them to A. After A sends 
(i*, mo, m1), the reduction samples R — {0,1}™*’” and sends ct := (RA;, R(b;+ 
Aisi) + m|$]). The reduction outputs the output of A. 

When O = Owe, ct is an encryption of mg, i.e. ct := cta, while when O = Oy, 
ct is by the leftover hash lemma uniform, i.e. ct := cty. If A distinguishes cty from 
ctu with probability ¢’, the reduction solves LWE with probability ¢’. Assuming 
that LWE is e hard, A cannot distinguish ctg from cty with e > e for any 
d € {0,1} and it cannot distinguish cto from ct, with e’ > 2e. 

Let us now consider the M-IND-PK security. The reduction defines pk; as 
previously. When O = Owe, then pk; is a proper public key and when O = Qu, 
then the public key is uniform. If A can distinguish them, it solves LWE. 


Definition 9 (Dual Regev Encryption [GPV08]). Dual Regev encryption 
with the parameters q,n,m € N with m > nlogq and message space {0,1} has 
the following syntax. 


Gen(1*) — (pk,sk): Sample R — {0,1}™*™, A — Z>" and output pk = 
(A, RA) and sk = R. 

Enc(pk,m) — (cti,ct2): Sample s — Zt, e1,e2 — X”, R’ — {0,1}™*™ and 
outputs cty := pk, - s+ e1, ct2 = pko : s + R’eg+ m4]. 

Dec(sk, ct) > m: Compute M := ctz —sk- ct; and output m := ||2m]]. 


Correctness follows in the same way as in Regev encryption. By the leftover 
hash lemma, the public key is statistically indistinguishable from uniform and 
therefore dual Regev encryption is M-IND-PK secure. 


Lemma 3. Let LWE be € hard and n polynomial, then dual Regev encryption is 
2e n-M-IND-CPA secure. 


Proof. The reduction for parameter d € {0,1} receives access to an oracle O 
that it uses to generate A;,b; for all i € [n]. It sets pk; := (A;, R;A;) for Ri — 
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{0,1}”*™ and forwards them to A. After A sends (i*, mo, mı), the reduction 
sends ct := (b;, Rib; + m|$]). The reduction outputs the output of A. 

When O = Owe, ct is an encryption of mg, i.e. ct := cta, while when O = Oy, 
ct is by the leftover hash lemma (with leakage Rez) uniform, i.e. ct := ctu. If 
A distinguishes ctg from cty with probability €’, the reduction solves LWE with 
probability e’. Assuming that LWE is € hard, A cannot distinguish ctg from ctu 
with e’ > e for any d € {0,1} and it cannot distinguish cto from ct; with €’ > 2e. 


We remark that our security proofs for multi user security require more LWE 
samples than the proofs of the standard PKE security notions. We emphasize 
that there are well known techniques to generate many LWE samples from a fixed 
amount of LWE samples [Reg05,ILL89]. Since such a rerandomization increases 
the noise level, one needs to start with a lower noise level which decreases the 
hardness of LWE slightly such that the approximation factor of the underlying 
SVP instance increases by a factor of O(n!/?). 


4 Oblivious Transfer from PKE 


Theorem 1. Let PKE be a M-IND-CPA and M-IND-PK secure and correct. 
Then Protocol 2 is a UC secure OT in the ROM. 


Proof. Given the correctness of PKE, an honest sender and receiver will establish 
correlation (so, 81), (b, sè) with overwhelming probability. 


We now focus on security against a malicious sender. 


Lemma 4. Let PKE be e, 1-M-IND-PK secure. Then, for any ppt adversary A, 
there exists a ppt adversary A’ such that for any ppt environment D and any 
polynomial size auxiliary input z 


| Pr[D(z, [A, R] 17) = 1] = Pr[D(z, [A’, Foti) = 1]| < Eu, 
where all algorithms receive input 1. R additionally receives input b. 


Proof. We construct a receiver R’ follows the description of R by sampling 
(pk,, sky) — Gen(1>), & — {0,1}*, ca — {0,1}, computing r := pk, — Hs(é), 
Ci_p ‘= ĉ @ H (r, cy). Unlike R, R’ computes ¢_» = cy ® Hi_o (7, ci_p), samples 
(pk_4,Ski1—») — Gen(1*) and programs H;_»(@1_») := pky_, — r. Otherwise, R’ 
follows the description of R. 

Notice that in case of R, r+ Hi_»(@1_5) is uniform while in case of R’, it has 
the distribution of a public key generated by Gen. If D can distinguish [A, R] 
from [A,R], then D can be used to break the 1-M-IND-PK security of PKE 
with probability €u as follows. The reduction receives a 1-M-IND-PK challenge 
pk and sets pk,_, := pk. When pk is uniform, it simulates R and otherwise R’. 
Therefore, 


| Pr[D(z, [A, RJ) = 1] — Pr[D(z, [A, RJ) = 1)| < eu. 


636 S. Badrinarayanan et al. 


Oblivious Transfer Protocol 


Primitives: 
— PKE scheme (Gen, Enc, Dec) with pseudorandom public keys in G. 
— Random oracles 
e Ho, Hi : Gx {0,1} > {0,1}. 
e Ho, Hi 3 {0, 1p = G. 
Common input: 1%. 
Sender S input: so, s1. 
Receiver R input: b € {0,1}. 


1. R samples (pkp, sky) — Gen(1*), ĉ& — {0, 1}°, ce — {0,1}*, computes 
— r := pk, — He (ĉ,) 
— cı—»b = & ® He (r, ca) 
and sends (r, co, c1). 
2. S computes 
— ĉo := c1 ® Ho(r, co), €1 = co @Hi(r, cı), 
— pko = r + Ho(ĉo), pki = r + Hi (ĉ1), 
— cto := Enc (pko, so) , ct1 == Enc (pk,, s1), 
and sends (cto, ct1). 
3. R computes s, = Dec (skp, cty). 


Fig. 2. Oblivious Transfer in the Random Oracle Model. (+, —) are used to denote the 
operations in G. © is the xor operation over {0,1}*. 


Based on R’, we can construct an adversary A’ which interacts with A, relays 
all interaction between A and D and needs to submit sg and sı to For. A’ follows 
the process of R’ when constructing r,co,ci that defines pkg and pk,. As R’, 
A’ knows both, skp and sk; which A’ uses to decrypt ctg and ct; to obtain so 
and sı. Since, A’ follows the description of R’, it leads to the same interaction 
between A and D. Therefore 


Pr[D(z, [A, R’]7) = 1] = Pr[D(z, [A', Fot]m) = 1], 


which concludes the proof of the lemma. 


We conclude the theorem with the following lemma that establishes security 
against a malicious receiver. 


Lemma 5. Let PKE be €, q?-M-IND-PK and e qg?-M-IND-CPA secure. Then, 
for any ppt adversary A making at most q random oracle queries to Ho, Hi, Ho 
and Hı combined, there exists a ppt adversary A’ such that for any ppt environ- 
ment D and any polynomial size auxiliary input z 
2 
| Pr[D(z,[S,Al) = 1] - Pr[D(2, [For Allin) = 1| < eu + e + &, 


where all algorithms receive input 1». 
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Proof. For simplicity, we assume that when A sends r, co, cı during the protocol 
to the sender, it has queried the random oracles for Ho(r, co), Hı(r, c1), Ho(éo) 
and H;(é,). We can assume this without loss of generality by making at most 4 
additional queries and setting the amount of queries to ĝ = q+ 4. Since this is 
not significant for our overall bound, we identify ĝ with q in the following. We 
also assume without loss of generality that A queries an oracle only once per 
input. 

We define three intermediate algorithms S1, S2, S3 playing the role of sender 
S. Sı is identical to S except that it simulates random oracles Ho, Hı as follows. 
For all ¢ € [q] and j € [q] (where q is the amount of queries), it samples pk; ; — G. 
Whenever A makes a query rj, Cia to Ha for i € |q] and d € {0,1}, Sı samples 
Ha(ri, Cia) — {0,1}* and does the following for any j € [q] with j < i for which 
the jth query is a query 7;,c¢j,1-a to Hi-q with rj = ri. 


1. Compute CRT = Cj1-d ® Ha(ri, Ci,d)- 
2. If Ha(ĉi ja) is defined (through programming or a query), abort. Otherwise, 
program Ha(ĉi, ja) = pk; j — ri. 


Afterwards, Sı answers the query with Ha(ri, cia). 

When A sends r, co, ci, Sı computes pko, pk, in the same way as S. Sı defines 
b* such that pk,_,« = pk; ; for a i € [q] and j € [q]. If no such b*,i, j exists, Sy 
aborts. Otherwise, it concludes the protocol according to the description of S. 

Let us now consider whether an environment D can distinguish [A, S] from 
[A, Si]. Since pk; ; are uniform in G, the output distribution of Ĥa, in particular 
for every point Ĥalêi ja) ‘= pk; ; — ri is uniform over G, in both settings. Other 
than that, S; differs from S by two abort conditions - one during queries to Hg 
and one after seeing (r,co,ci). Let us assume that Sı aborts during a query to 
Ha. This implies that either A has queried Ha for Ĉi ja = Cj,1—-a ® Ha(ri, Ci,a) for 
an j € |q] or there exists a j € [q] and a j’ € [q] \ {7} with cj 1-4 ® Ha(ri, Ga) = 
Cj 1—a ® Ha(ri, Cia). In the former case, A would predict Ha(ri, cia) = Cj1-a ® 
ĉi ja Which happens for each query with probability at most ṣẹ. In the latter 
case, Cj,1-d = Cj’,1-q and thus A would make the same query twice which we 
have excluded w.l.o.g. since every adversary queries any input at most once.! 

The second abort condition never triggers for the following reason. Since A 
sends r,co,¢C1, he will query r,co to Ho and r,cı to Hı. Let b* € {0,1} such 
that A makes query r, Cp» before r,c1—p*. When A makes query c1—p*, Cp» will 
therefore be defined and Sı will program Ayo» (co ® Ha(r, c1—*)) = pk;j =r 
for some i,j € [q]. By the definition of pk,_,.«, pky_,» = pk; j- Thus, we obtain 
the bound 


| Pr[D(<z, [S, A] z) = 1] — Pr[D(<, [S1, A]77) = 1]| < =. 


1 Tn case an adversary is allowed to query inputs multiple time, Sı would simply not 
try to program the oracle on an input that the adversary has queried already and 
send the output that is consistent with the previous query for that input. 
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S2 is identical to Sı except that it samples (pk; ;,sk;,;) — Gen(1*) for any 
i,j € [q]. If there is an environment D that can distinguish [A, S2] from [A, S1], 
then we can break the q?-M-IND-PK security, i.e. public keys are hard to dis- 
tinguish from uniform, of PKE as follows. The reduction receives q? challenge 
public keys Pk; ; for i, j € [q]. Instead of sampling pk; j, it sets pk; ; = pk; ;. 

When the challenge public keys are uniform, the reduction simulates Sı and 
otherwise (when the challenge public keys are distributed according to Gen) S2. 
Therefore, 


| Pr[D(z, [S1, A] z) = 1] — Pr[D(<, [S2, A] r) = 1]| < eu. 


Our next intermediate sender S3 follows the description of S2 except that 
after receiving r,co,c, from A, it defines ct;_y» = Enc(pk,_,«,0). If there is 
an environment D that can distinguish [A, S2] from [A,S3], we can break the 
q’-M-IND-CPA security of PKE as follows. The reduction receives q? challenge 
public keys Pk; ; for i, j € [q]. As previously, it sets pk; ; = pk; j. It then follows 
the description of S2 until it defines b* and can compute pk;_,« = pk; j for some 
i,j € [q]. The reduction sends ((i, j), mo := s1—*, Mı := 0) to the M-IND-CPA 
challenger and receives back ct*. It then sets ct1—» := ct*. When ct* encrypts 
51_p«, the reduction simulates Sz and otherwise S3. Therefore, 


| Pr[D(z, [S2, A] z) = 1] — Pr[D(<, [S3, A] z) = 1]| < é. 


Based on S3, we can define A’ which interacts with A, relays all interaction 
between A and D and submits b* to For and then receives sẹ» which is used 
to generate ctp». Since A’ follows the description of S3, it leads to the same 
interaction between A and D. Therefore, we can conclude the lemma with 


Pr[D(z, [S3, A] z) = 1] = Pr[D(z, [Fot, A" m) = IJ. 


Lemmas 4 and 5 are sufficient to establish Theorem 1. 


Acknowledgements. We thank James Bartusek for a discussion that led to the tech- 
niques presented in this paper. 
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Abstract. Secure multi-party computation (MPC) allows participating parties to 
jointly compute a function over their inputs while keeping them private. In par- 
ticular, MPC based on additive secret sharing has been widely studied as a tool 
to obtain efficient protocols secure against a dishonest majority, including the 
important two-party case. In this paper, we propose a two-party protocol for an 
exponentiation functionality based on an additive secret sharing scheme. Our pro- 
posed protocol aims to securely compute a public base exponentiation a” mod p 
for some prime p, where the exponent x € Zp is a (shared) secret and the base 
a € Zp is public. Our protocol is based on a new simple but efficient approach 
involving quotient transfer that allows the parties to perform the most expensive 
part of the computation locally, and requires 3 rounds and 4 invocations of mul- 
tiplication. As an intermediate primitive for our efficient two-party exponentia- 
tion protocol, we propose an efficient modulus conversion protocol. This protocol 
might be of independent interest. 


1 Introduction 


1.1 Background 


Secure multi-party computation (MPC) allows a set of parties to compute an arbi- 
trary function of their inputs without revealing the private inputs to each other, except 
for what can be obtained from the output of the function. While MPC is appli- 
cable in many different settings, a line of research which has attracted a lot of 
attention recently, is the application of MPC in the area of machine learning (e.g., 
see [RSC+19, MLS+20, CVA18, KRC+20, AA20, CCPS 19, BCP+20, CRS20]). One of 
the technical issues when combining MPC and deep learning is that deep learning 
require various operations which are difficult to implement in MPC efficiently (e.g., 
division, reciprocal operation, square root, and exponentiation). 


Exponentiation in MPC. In this paper, we deal with how to implement an expo- 
nentiation functionality in MPC based on secret sharing. Exponentiation is frequently 
used function in machine learning, and is also useful e.g. for protecting secret keys 
© Springer Nature Switzerland AG 2022 
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in distributed systems such as Blockchain. In the latter context, MPC is used to 
generate discrete logarithm-based digital signatures while distributing the secret key, 
which is referred to as a threshold signatures or distributed signatures [GGN16, Lin17, 
WWW+14]. 

Exponentiation MPC protocols for different settings have been proposed so far. The 
first one is public base: the base is public and the exponent is secret; the second one is 
public exponent: the base is private and the exponent is public; and last one is private 
exponentiation: both the base and the exponent are privately held. In this paper, we focus 
on the public base variant. This variant shows up in many real-world applications. For 
example, in the deep learning setting mentioned above, it is frequently required that the 
value e” is computed, where e is Napier’s constant. 

In the following, in order to clarify our goal, we highlight three properties of expo- 
nentiation MPC protocols: (based on) additive secret sharing/Shamir’s secret shar- 
ing, honest-majority/dishonest-majority, and with/without bit-decomposition. Firstly, 
we note that our goal is to construct an efficient public base exponentiation protocol with- 
out using bit-decomposition based on additive secret sharing in the dishonest-majority 
setting. 


Additive Secret Sharing vs. Shamir’s Secret Sharing. When constructing MPC pro- 
tocols based on secret sharing, we mainly have two types of secret sharing: additive 
secret sharing and Shamir’s secret sharing. 

Additive secret sharing [ISO] is defined over a finite additive group (G, +). In 
additive secret sharing for n parties, a secret x € G will be randomly divided into 
[a]',...,[a]” such that [x]! + --- + [2]" = x, where n is the number of parties. The 
defined group for additive secret sharing determines the element form of shares and 
type of circuit on which parties want to perform. 

Compared with Shamir’s secret sharing, which is based on polynomial interpola- 
tion, one of the advantages of MPC based on additive secret sharing is the compatibility 
with “dishonest-majority” MPC frameworks. In other words, MPC based on additive 
secret sharing can be easily integrated with existing dishonest-majority MPC frame- 
works [DPSZ12, DKL+13, KPR18, ALSZ15,KOS16], and thus can utilize the ecosys- 
tems of these frameworks such as other efficient MPC protocols or cheater detection 
functionality. This is explained in more detail below. 


Dishonest-Majority vs. Honest-Majority. Honest/dishonest-majority is a criteria of 
MPC security regarding the number of corrupted parties among all participants. We 
call a MPC protocol secure against an honest (resp. dishonest) majority if the num- 
ber of corrupted parties are less than (resp. equal or more) than half of the total num- 
ber of parties. Note that security in the dishonest-majority setting is much harder to 
achieve than in the honest-majority setting. In the fully information-theoretic setting, 
there exists an impossibility result showing that MPC for arbitrary functions cannot be 
constructed in the dishonest-majority setting [BGW88]. To achieve security against a 
dishonest majority, the previous works often introduce some computational assump- 
tions, or the “online/offline” paradigm described below. 
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A Bit-Decomposition-Based Approach. A common approach for realizing some 
MPC functionalities is to firstly compute a binary representation of the input in 
secret-shared form, and then construct a MPC protocol for the evaluation of the func- 
tionality in question via a Boolean circuit or a “mixed” Boolean and arithmetic circuit. 
The first step is known as bit-decomposition, and the usefulness of this approach was 
illustrated by Damgard et al. in [DFK+06] who proposed MPC protocols for equality 
testing, comparison, and exponentiation. Concretely, in a bit-decomposition protocol, a 
secret shared input [x], is converted to a bit-wise sharing [£o]p, + , [£e—1]p, such that 


r= Da x2, where the input z € Zp for some prime p. 

Making use of bit-decomposition in the public base setting allows the adaptation of 
the well-known “square-and-multiply” algorithm (also referred to as “exponentiation 
by squaring” or “binary exponentiation”) to the MPC setting. More specifically, con- 
sidering a public base a and a secret shared exponent [x]p, using bit-decomposition the 
secret shared bit representation [x] g = [xo], ... [£e—1]p, where y 2*x;, xi € {0,1} 
can be obtained. Then, using the square-and-multiply algorithm, the shares [a”],, can be 
obtained from this equation: 


l-1 
pitz i i 
a® = aè are = [Pot = [ea 41-29 
i=0 


While a protocol based on this can be implemented in O(1) rounds, © (log £) invo- 
cations of the underlying multiplication MPC protocolis required! due to the cost of 
bit-decomposition. The relatively high cost in terms multiplications is a disadvantage 
of this approach. 

To avoid this, previous works [NX11,AAN18] focused on how to construct pub- 
lic base exponentiation protocol without relying on bit-decomposition techniques. In 
particular, Aly, Abidin, and Nikova [AAN18] proposed a highly efficient public base 
exponentiation protocol and it requires only 3 rounds and 6 invocations of multiplica- 
tion. While their protocol is efficient, it depends on Shamir’s secret sharing scheme.? 


Online/Offline Paradigm and Additive Sharing. The online/offline paradigm using 
preprocessed random shares called “Beaver triple” or “multiplication triple” [Bea92] 
is a well-known and easy way to implement multiplication in the dishonest-majority 
setting. These dishonest-majority protocols consist of a preprocessing phase for gen- 
erating Beaver triples (called the “offline phase”) and a MPC protocol for the func- 
tion to be computed which consumes Beaver triples (called the “online phase”). To the 
best of our knowledge, all known efficient offline protocols generating Beaver triples 
are designed for additive secret sharing, such as protocols using homomorphic encryp- 
tion [DPSZ12, DKL+13,KPR18], or oblivious transfer [ALSZ15,KOS16]. Therefore, 


! Since the multiplication MPC protocol is dominant in the communication, the communication 
complexity of MPC is usually measured by the number of invocation of multiplication. 

> To construct an exponentiation protocol over additive secret sharing, we could consider utiliz- 
ing share conversion between Shamir and Additive secret sharing. However, [AAN18] addi- 
tionally assumes the base and the exponent are shared by different moduli, which implies an 
additional modulus conversion is needed. These aspects make this approach more expensive. 
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MPC based on additive secret sharing is useful in that it is easily integrated with these 
protocols. 

On the other hand, the BGW protocol [BGW88], which is a well-known MPC pro- 
tocol based on the Shamir’s secret sharing scheme, is limited in its scope to the honest 
majority setting (that is, the number of corrupted parties is bounded by n/2). 


1.2 Our Contribution 


Based on the above motivation, we propose a new public base exponentiation protocol 
without bit-decomposition based on additive secret sharing with the following three 
contributions. 


New Framework for Exponentiation Protocol. At first, we propose a new framework 
for exponentiation constructed via a quotient transfer (QT) functionality.’ In this 
framework, we realize a constant round public base exponentiation protocol based 
on an additive secret sharing scheme. 

Efficient Exponentiation Protocol Based on Constrained QT Protocol. For obtain- 

ing an efficient exponentiation protocol in our framework, we propose a limited 
QT protocol (which we denote a constrained QT protocol) without relying on bit- 
decomposition. Here, constrained means that our QT protocol only works for even 
integers as input. Since we bypass the use of a bit-decomposition protocol, we suc- 
ceed in reducing the complexity of our QT protocol. Combining our framework and 
constrained QT protocol, we obtain an efficient public base exponentiation protocol 
based on additive secret sharing. 
Note that our exponentiation protocol has the limitation that inputs should be less 
than half of the underlying modulus. That is, compared to the existing exponentia- 
tion protocols, an additional condition 2x < p is required for our protocol, where 
x is the input and p is the modulus of the underlying group. We believe that this 
limitation is not significant for many practical applications. 

Modulus Conversion Protocol Based on Constrained QT Protocol. In order to uti- 
lize our constrained QT protocol effectively, the secret shared exponent must be 
multiplied by two to ensure it is even, which in turn leads to the requirement that 
the public base is a quadratic residue in the group over which the exponent is shared. 
To address this limitation, we also propose a new modulus conversion protocol 
that enables the efficient conversion of additive shares over a prime field to addi- 
tive shares over a different prime field. Using this we can ensure that the public 
base is always a quadratic residue via an appropriate conversion before running our 
exponentiation protocol. The modulus conversion protocol is likewise based on our 
constrained QT protocol, and to the best of our knowledge, outperforms existing 
protocols. This might be of independent interest. 


As the most important advantage, our resulting exponentiation protocol requires 
only 3 rounds and 4 invocations of multiplication even in the case that we need our 


3 QT was implicitly defined by [KIM+18]. In addition, in [OWIO19], a part of their protocol 
can be seen as a QT protocol based on bit-decomposition, even though they did not directly 
highlight this as a QT protocol. 
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Table 1. Comparison between two-party (public base) exponentiation protocols. 


Protocol Tool | DM frame. comp. î | BD* | Rounds} Multiplicationt 
[DFK+06] Linear Yes Yes 119 | O(£log£) | 50176 
[NX11]* Linear Yes No 20 O(e) | 10508 

[AAN18] Shamir’s No No 3 O(1) 6 

This work (with conversion)® || Additive Yes No 3 O(1) 4 

This work (w/o conversion)® || Additive Yes No 2 O(1) 3 


Den 


$ In this column, “Linear” stands for (general) linear secret sharing, “Shamir’s” stands for Shamir’s secret 
sharing, and “Additive” stands for additive secret sharing. We note that a secret sharing scheme is called 
linear if the reconstruction procedure of the scheme is a linear mapping. Linear secret sharing includes 
Shamir’s secret sharing and additive secret sharing. 

{T In this column, we note whether each protocol is compatible with dishonest-majority (DM) frame- 
works. 

* In this column, we point out whether each protocol requires bit-decomposition (BD) or not. 

* The proposed protocol is a private exponent type protocol, not a public base type protocol. As the 
former implies the later, in our comparison, we use their private exponent type protocol as a public base 


type. 

+ We consider the case £ = 64 when estimating the number of multiplications. 

$ Here, we consider two cases: whether we need modulus conversion or not. As mentioned in Sect. 1.2, 
in our protocol, if the public base does not have quadratic residue, we require an additional modulus con- 
version. In this case, when our modulus conversion is used, we need additional 1 round and 1 invocation 
of multiplication. 


modulus conversion protocol as subroutine. Moreover, if modulus conversion is not 
required, our exponentiation protocol only requires 2 rounds and 3 invocations of mul- 
tiplication. We furthermore note that our modulus conversion protocol requires only 1 
round and 1 invocation of multiplication. 


1.3 Existing Exponentiation Protocols Without Bit-Decomposition 


In this section, we compare the efficiency of the existing exponentiation protocols and 
summarize the comparison in Table 1. Up until now, as mentioned in Sect. 1.1, there 
have been a few works on exponentiation protocols not relying on a bit-decomposition 
protocol. 

In 2011, Ning and Xu [NX11] introduced private exponentiation type protocols 
without bit-decomposition. As a result, they obtain a protocol with 20 rounds and 164 - 
£+ 12 invocations of multiplication for a public base, where £ is the number of message 
bits. In particular, when we consider £ = 64, the number of invocations of multiplication 
is 10508. 

Recently, Aly, Abidin, and Nikova [AAN18] simplified Ning et al.’s protocol, and 
reduced the communication complexity and the number of invocations of multiplication 
based on the Shamir’s secret sharing scheme. They also constructed a new public expo- 
nent exponentiation protocol. Regarding the public base exponentiation protocol, the 
number of rounds is 3 and the number of multiplication invocations is 3(1 + |log(n) }), 
where n means the number of parties. In particular, in the two-party setting (that is, 
n = 2), the number of invocations of multiplication is 6. We note that their protocol 
needs to use different moduli in the groups of base and exponentiation in order to ensure 
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correctness. This is a drawback when considering composition with other protocols (not 
only in theoretical sense but also in an implementation). Compared to their protocol, our 
protocol has same modulus in the groups of base and exponentiation. 

In Table 1, we compare our work with the results by Ning et al. and Aly et al. in the 
two-party setting (all protocols without bit-decomposition), and the result by [DFK+06] 
which uses bit-decomposition. There, £ denotes the bit-length of the input. In the rows 
“[DFK+06]” and “[NX11]”, we consider the case 2 = 64 when estimating the number 
of multiplications. 

Although we obtain an efficient constant-round MPC protocol for an exponentiation 
functionality in the two-party setting, it is still an open problem to extend our protocol 
to the three or (more general) n-party setting. The main difficulty is to extend our QT 
protocol to the n-party setting efficiently. See Sect. 3.3 for the details. 


1.4 Technical Overview 


In the following, we will outline the main ideas behind our constructions. 


Local Exponentiation. The main idea behind our approach is to make the comput- 
ing parties do most of the computation locally. In particular, the exponentiation itself is 
done locally based on the shares of the exponent x. Let us for a moment assume that 


the two parties hold shares [x]}, [7]; € Zp, respectively, and that [a], + [a]; = x over 


p 
the integers i.e. no reduction modulo p is required to recover x. In this case, the par- 


[x 


ties can directly compute y; = a l» mod p, which will satisfy o = yı ` y2 mod p = 


altlp+llp mod p = a” mod p. Shares of o can be obtained by letting each party com- 
pute a sharing of y;, send one share to the other party*, and let both parties interact in a 
standard multiplication protocol to compute [o];. 

However, a standard secure sharing of x requires a potential reduction modulo p 
when adding the shares i.e. [x]} + [a]? = 2 mod p which implies [x]} + [x]? = £ +t- p 
for t € {0,1} over the integers. In other words, the above value o will in this case be 
of the form o = a?” - a®?. Here, at? = a'mod p due to Fermat’s little theorem, and we 
observe that the term a’ can be eliminated from o assuming the parties can compute 
(shares of) t by multiplying o with the multiplicative inverse a~' conditioned on the 


value of t. 


Efficient Quotient Transfer. The above approach assumes that the parties can effi- 
ciently compute the value t. This was implicitly defined by Kikuchi et al. [KIM+18] as 
a QT protocol. Note, however, that the efficiency of this protocol is crucial in the above 
approach to exponentiation, and basing the QT protocol on e.g. bit-decomposition as 
done by [OWIO19], will defy the purpose of this approach. 

We propose a simple but efficient approach to a QT protocol constrained to even 
inputs. Specifically, we observe that if the input x is even, the value of ¢ can be deter- 
mined by the least significant bits of the shares [x]; and [z], as ¢ = 1 implies that 


4 Tn our actual exponentiation protocol given in Algorithm 1, each party locally sets the shares 
[yi], = yi and [y:]}, ' = 0 (and does not send their shares to each other) in order to optimize 
the round complexity. 
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[x], + [x]? over the integers must be odd as the prime p is likewise odd, whereas t = 0 
implies that [x]; + [z] over the integers must be even (as the input is likewise even). 
Hence, the QT protocol can in this case be implemented via the appropriate comparison 
of the least significant bits of the shares [a], and [x]?. To make use of this constrained 
QT protocol in the computation of an exponentiation, we simply multiply input x by 2, 


2 
and compute ya” 


Modulus Conversion. The above assumes that an appropriate value y/a can be com- 
puted i.e. that a is a quadratic residue modulo the prime p. Note that since p is prime, 
half of all elements in Z, are quadratic residues, and if this is the case for a, the 
above approach works. However, if a is a quadratic non-residue, a different approach is 
required. We address this case by simply converting the shares of x in Z, to shares in 
Zy for a prime p’ for which a is a quadratic residue. 

Similar to the above, for this to work, an efficient modulus conversion protocol is 
required. We obtain this by observing that our QT protocol allows the shares of q to 
be drawn from Z, as opposed to Zp which the shares of the input x belongs to, which 
in turn, allows us to construct a very efficient modulus conversion protocol (the details 
are given in Sect. 3.2). Note that the restriction that the input x is even can easily be 
overcome by firstly multiplying x by 2, doing the conversion, and then multiply the 
result by 271, which are both local operations. In comparison to the efficient conver- 
sion protocol by [KIM+18], which is based on bit-wise processing of the input, our 
protocol is simpler and more efficient. Concretely, while the modulus conversion pro- 
tocol [KIM+18] requires O(log p’) rounds and O(log p’) invocations of multiplication, 
our protocol requires only 1 round and 1 invocation of multiplication. 


2 Preliminaries 


In this section, we review some preliminaries. 


2.1 Notations 


In this paper, we use the following notations. x — X denotes sampling an element 
x from a finite set X uniformly at random. y — A(x;1) denotes that a probabilistic 
algorithm A outputs y for an input x using a randomness r, and we simply denote 
y <— A(x) when we need not write an internal randomness explicitly. \ denotes a 
security parameter. A function f(A) is a negligible function in A, if f(A) tends to 0 faster 
than $ for every constant c > 0. negl(A) denotes an unspecified negligible function. 
PPT stands for probabilistic polynomial time. p and p’ denote safe prime numbers. n 
denotes the number of parties. Let [x],, denote a secretly shared input x € Zp. Let P 
be the set of n parties and P; the i-th party for = 1,--- ,n. The operation + is a 
normal addition over integers. The congruence relation a = kp + b is represented as 
a = b(mod p), where a, b, k, and p are integers. 
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2.2 Additive Secret Sharing 


In this section, we introduce the definition of additive secret sharing. In general, an 
additive secret sharing scheme is defined over finite additive groups. Among them, we 
consider the case over Z, (the set of integers modulo p). An additive secret sharing 
scheme over Zp consists of the following two algorithms Share and Reconstruct. 


— Share: Given a value x € Zp as input, this algorithm outputs shares |x], = 
([a]5>--+>[a]2) of x such that [z]; + [x]? + --- + [x]? = x(mod p), where [z]* 
denotes P;’s share. All shares are distributed uniformly at random in Z, under 
the constraint that they sum to x. In the following, we use the notation [x], — 
Share(z). 

— Reconstruct : Given all n shares [x], as input, this algorithm outputs a value 


x = ([x]5 +--+ + [2]?) modp. 


Note that the requirement on the random distribution of shares implies that only given 
access to n — 1 shares in [z],,, the value x is information theoretically hidden. 
An additive secret sharing scheme supports the following computations on shares. 


— Local operation : Given shares [a],,,[b], and a scalar a € Zp, the parties can 
generate shares of [a + b]p, [aa],, and [a + a], using only local operations. 

— Multiplication : Given shares [a], and [b]p, we assume the parties can generate 
[ab], by invoking an ideal multiplication functionality Fynu([alp, [b]p). This might 
be implemented using a multiplication protocol based on Beaver triples [Bea92]. 
In the following, we use the notation [ab], < [a]p - [b]p to denote [ab], — 


Fryoui([a]p; [b]p). 


2.3 A Model of Secure Two-Party Computation 


In this section, we formally introduce two-party computation. A two-party computation 
is specified by a (possibly probabilistic) procedure referred to as a functionality. Denote 
f : ({0,1}*)? — ({0,1}*)? as the two-ary functionality. Specifically, each party P; can 
obtain distinct outputs f;(x) in general, where f = (fo, f1) and x = (#1, x2) is a pair 
of inputs. 


Definition of Security. The security of MPC is formalized by simulation-based secu- 
rity definitions. Roughly speaking, if there exist simulators who can generate the view 
of each party in the execution from given inputs and outputs, an MPC protocol is called 
secure. This formalization implies that each party learns nothing about other users’ 
inputs from the execution of the protocol, except for the information that can be derived 
from outputs. 


Definition 1 (Computational Indistinguishability). Two probability ensembles X = 
{X (a, 7) }aeso,1}*, nen and Y = {Y (a, n) }ae{o,1}* nen are said to be computationally 


indistinguishable, denoted by X N Y, if for any non-uniform PPT algorithm D there 
exists a negligible function negl(A) such that for every a € {0,1}* and every n € N, 


PE [D(X (a,n)) = 1] = Pr [DY (a,n)) = 1]] = negi(à). 
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Definition 2 (Security). Let f : ({0,1}*)? — ({0,1}*)? be a 2-ary functionality and 
let x be a two-party protocol for computing f. Let x = (xo, xı) be a pair of inputs. 
Let View; (x, A) be the view of the party P, during an execution of a protocol 7 on the 
input x and A. Let Output” (x) be the output of all parties from an execution of n. We 
say that a protocol 7 securely computes f in the presence of semi-honest adversaries, 
if there exist PPT algorithms Sı and S2 such that 


{(S1(1*, 21, fi(x)), F) }x,a © {(Views (x, A), Output” (x, A))}x,a 


and 


{(S2(1*, £2, fo(x)), FX) baa © {(View9 (x, A), Output” (x, A) }x,a; 


where x1, x2 € {0,1}* and |x,| = |xəl. 

In addition, we say that n securely computes f in the presence of semi-honest adver- 
saries in the F-hybrid model if n contains ideal calls to a trusted party computing a 
certain functionality F. 


Remark 1 (On Local Computations). Note that the functionalities with only a local 
computation (i.e., a computation which needs no communication among parties) obvi- 
ously satisfy the above Definition 2, since the view of such functionality is only the 
information that can be obtained from shares. Such a view leaks no information regard- 
ing inputs due to the security of the underlying secret sharing scheme. 


Universal Composability Framework. A stronger notion of security typically consid- 
ered for MPC can be obtained via the Universal Composability (UC) framework, which 
is a general framework allowing arbitrary MPC protocols to be represented and ana- 
lyzed. Protocols that are proven secure in the UC framework have the property that they 
maintain their security when run in parallel and concurrently with other secure and inse- 
cure protocols. In [KLR10], Kushilevitz, Lindell, and Rabin showed that under certain 
circumstances, stand-alone security as defined above (Definition 2), implies security in 
the UC framework: 


Theorem 1 (Theorem 1.5 in [KLR10]). Every protocol that is secure in the stand- 
alone model and has start synchronization and a straight-line black-box simulator is 
UC-secure under concurrent general composition (universal composition). 


In the above theorem, a “straight-line” simulator means that a non-rewinding simu- 
lator, and “start synchronization” means that the inputs of all parties are fixed before the 
execution begins (also called as “input availability”). All of the protocols considered in 
this paper satisfies these conditions, and hence, Theorem | ensures that we obtain UC- 
security of our protocols. 


2.4 Quotient Transfer Functionality 


We now introduce the quotient transfer (QT) functionality For for a two-party pro- 
tocol. This functionality plays a central role in our construction of an exponentiation 
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protocol. Let [a], = ([a]},, [a];). Here, we have a value t satisfying [a]} + [a]? = a+t-p 


(t € {0, 1} over the integers. We define the QT functionality as 


[tle — Far([alp,p’). 


Note that, if [a], + [a]; < p, then ¢ = 0, else ¢ = 1. We emphasize that although 


P. 
the individual shares [a]’, are elements in Zp, the addition considered above is over the 


, P 
integers à 
2.5 Modulus Conversion Functionality 


We introduce the modulus conversion functionality FCony. A modulus conversion func- 
tionality is a functionality that converts a share in Z, into one in Z, (with p Æ p’). Let 
x € Zp. We define Fconv as 


[z]y — Foonv([2]p; p"). 


2.6 Exponentiation Functionality 


Here, we introduce a (public base) exponentiation functionality Fgxp. Let p be some 
prime. Let a € Zp and x € Zp. We define Fpxp as 


[a*mod pl], — Frxp(a, [z],). 


Note that, as we will consider additive secret sharing of x over Zp, we define the above 
exponentiation functionality for x € Zp, whereas the exponent space typically con- 
sidered for exponentiation in Z, would be restricted to Zp(p) = Zp-1. However, the 
functionality remains well-defined for the extension x € Zp. 


3 Our Exponentiation Protocol 


In this section, we propose our exponentiation protocol. We first introduce a new frame- 
work for an exponentiation protocol in Sect. 3.1. Then, we provide a modulus conver- 
sion protocol using a QT protocol in Sect. 3.2. Next, we provide a constrained QT pro- 
tocol without bit-decomposition which only works on even numbers in Sect. 3.3. In the 
end, we introduce a concrete construction of our framework of an exponentiation proto- 
col using our constrained QT protocol and modulus conversion protocol (which is also 
obtained by our constrained QT protocol) in Sect. 3.4. 


3.1 A New Framework for Exponentiation Protocol 


In this section, we provide our new framework for an exponentiation protocol. Before 
describing our framework formally, we give its overview. 

In the public base exponentiation setting, a naive idea for computing a*mod p is 
that each party P; (i € {0,1}) locally computes a'*l>mod p and invokes a multiplica- 


2\p+llpmod p. Here, a subtle point is that [x]? + [2]} is 


i [ 
tion protocol to get a value a p 
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not always equal to x since there is a situation that [x]? + [a], = x + p holds. That is, 
we have the case al*l>+llp mod p = a*t? mod p= (a®- a?) mod p and in this case, 
we should eliminate the term a?mod p to get a correct result a”mod p. In order to 
solve this problem, we utilize a QT functionality For. By using For, we can know the 
secret shared values [t], such that [x]? +[2]), = 2+t-p and eliminate the term a?mod p 
correctly. Formally, our new framework Ipxp for Fgxp is described in Algorithm 1. 


Algorithm 1. Our framework for exponentiation protocol IIpxp 


Input: a, [x]p 

Output: [o], 

: Each P;(i € {0, 1}) locally computes y; = a!™!»mod p 

: Each P;(i € {0, 1}) locally sets [yi], = yi and [yi],~* = 0 
: [dlp — [yop - [y1]p 

> [tl — Far([z]p,p) 

> [oily — (1 — [Ap) [dlp 

: [o2]p — [t]p[a]p 

: [oly = [or]p + [02]p (a)? 


[x 


AYADUNHWN KH 


Correctness. Here, we show the correctness of our framework of an exponentiation 
protocol Ipxp. 


Theorem 2. IIpxp is correct in (Fy, Far)-hybrid model. 

The protocol IIpxp is aimed at correctly computing a*mod p, where a € Zp 
and x € Zp. Each party P; firstly computes all> mod p and sets [yi], = yi 
and [y;]4~* = 0 locally, then utilizes a multiplication MPC protocol to compute 


p 
lz] +e] mod p. Here, we need to handle two 


(all>) mod p- (all>) mod p = a 

cases for the value x. Concretely, we have [2]? + [a], = x + t- p(t € {0,1}). In the 
following proof, we have the correct value a” in both of the cases [x]? + [x], < p and 
[x], + [2], > p. 

Proof of Theorem 2. Let a € Zp and x € Zp. Regarding the secret value x, we have 
the following two cases. 


— In the case of [2]? + [2], < p (that is, we have t = 0 and [2]? + [a], = x), each 


P; locally computes all>mod p, sets [yi], = ys and [y]5~* = 0, and uses Fy 
to compute d. From the correctness of Fmu, we have d = altle+ll>mod p= 
a*mod p. Moreover, from the correctness of Far, we get t = 0. Therefore, 0) = 
d, 02 = 0, and o = a*mod p hold. 

— In the case of [x]? + [z]; > p (that is, we have t = 1 and [2]? F [z]; = £ + p), 
each P; locally computes allr mod p, sets [yl = y; and ale = 0, and uses 
Fu to compute d. From the correctness of Fyyy1, we have d = allp+ [lp mod p= 

a**Pmod p. Moreover, from the correctness of Far, we get t = 1. Therefore, 0; = 

0,02 = d-(a)~?mod p = (a)**” - (a)~?mod p = amod p, and o = amod p 

hold. (Theorem 2) 
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Security. Then, we prove the security of our new framework. 


Theorem 3. IIgxp can securely compute Fexr in (Fumu, Far)-hybrid model. 


Proof of Theorem 3. We construct a separate simulator for each party (Sọ for the 
Po’s view and S4 for the P,’s view, as in Definition 2). Consider the case that P} is 
corrupted. The view of P, can be written as: 
- 1 
View, ™ (a, [2]p) = (idp; Ilp loalp, [orp 7), 

where [d], = [yolp > [v1]p; yi = a! !»mod p fori € {0,1}, [yi], = ys and [yi]>-* = 0 
for i € {0,1}, and r is a randomness used by P}. We need to show that the simulator S; 
can generate the view of P,. In the protocol, P, receives an input consisting of values 


a and [x]}. Then, Sı is given (a, []},, [o]}) and works as follows: 


. Sı chooses a uniform randomness r from Zp. 


. Sı chooses a uniformly distributed random number [d] 
. Sı computes [01]), = [o0]; — [02]; - a™*. 


. Sı outputs (idi; [lp [oa], lous?) 


1 


p [é]p» and [o2]; from Zp. 


WN Fe 


Due to the security of the underlying additive secret sharing scheme, [d] Be ft] zi [o2] j 
and r are uniformly at random in Zp. Moreover, since the shares of [o]} and [o2]} are 
distributed randomly conditioned on [o]/, = [01], + [02], : a7}, which is same as in the 
real execution. Thus, the distribution of ([d]}, [t], [o2],,[01],) output by Sı is equal 
to one which are given for Pı. Furthermore, due to the correctness of IIpxp (shown 
in Theorem 2), the output of IIgxp(a, [z],) is equal to the output of functionality 


Frxp (a, [x],). Hence, we have 


{(Si(@, [a]p,[olp), Fexp(@,[a]p)} & {(ViewT™ (a, [z]p), Hexe (a, [2]p)} 


Similar with above, in the case that Py is corrupted, we can also construct Sp which can 
simulate the view of P). Therefore, IIgxp securely computes Fexr in (Fmu, Far)- 
hybrid model. (Theorem 3) 


Remark 2 (Existing (Inefficient) Protocols over Our Framework). As mentioned in 
Sect. 1.2, we can realize IIpxp using existing primitives. Specifically, a part of the 
previous work [OWIO19] can be seen as a QT protocol, even though they did not 
explicitly define this as a QT protocol. However, since their protocol is based on bit- 
decomposition, the resulting IIzxp suffers from a large multiplication cost. 


Efficiency of Our Exponentiation Framework. Here, we give an analysis for the 
efficiency (round complexity and the number of invocations of multiplication) of our 
exponentiation framework. In the analysis for round complexity, an important point is 
that we can execute some procedures simultaneously in IIpxp. Concretely, since the 
computation in the functionality For (Step. 4) does not depend on the previous steps, 
For can be executed with previous procedures in a parallel way. Taking into account 
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this optimization, the round complexity of IIpxp is max(rar, rMul) + rmul, where 
ret and rmu represent the round complexity of For and Fmu respectively. Also, the 
number of invocations of multiplication is ior + 2- imul, Where ior and imu represent 
the number of invocations of multiplication of For and Fmu respectively. 


3.2 A Modulus Conversion Protocol Using Quotient Transfer Functionality 


In this section, we provide a modulus conversion protocol which can change x € Zp 
to x € Zp. This modulus conversion protocol consists of the QT functionality and a 
transfer formula. By using the QT functionality, we can know the secret shared values 
[t]» such that [x]? + [z]; = x + t - p. Then, the transfer formula change z into z € Zy 
by eliminating the influence of overflow. Formally, our modulus conversion protocol is 


described in Algorithm 2. 


Algorithm 2. Our modulus conversion protocol I conv 


Input: [x]p, p" 
Output: [x]; 

1: Kly — Farle] p’) ; f 

2: Each P;(i € {0, 1}) sets [z]; = [x], — [t]p P 
3: Output [z]y 


Correctness. Firstly, we prove the correctness of IL conv. 


Theorem 4. IIConv is correct in For-hybrid model. 


Proof of Theorem 4. From the correctness of For, we can obtain a correct t’ € {0,1} 


satisfying [2]? + [a], = 2 + t - p. Next, we obtain [x]?, = [x]? — [t’]?, - p and [z]; = 
[x]; — [t']} - p. Then, we have the following equation 
lel + [z]y = [z] — [tly p+ [elp — lp p mod p’ 


= [z] + [e] — (lp + Ip) pmod p’ 


=g+t p- (t]p + [t]: pmodp 


ere, since we have r+ , = when 1+ ı Sp holds an iF ,= 
H i have [t']>, + [t]; =t when [t']9 + [¢’]5, < p' holds and [¢’}?, + [t15 


£ 1 10 1 7 10 1 A / / 
t + p’ when [t]p + [t’], > p holds, [t]; + [t]; mod p’ equals to t’. Then, ¢’ - p — 


(E19 + [¢’]},) -p is equal to 0 over modulus p’. Thus, we have [a]?, + [x]; = «mod p' 
and get shares {2]?, and [x], satisfying that the sum of them is equal to x over Zy. 
Therefore, II conv is correct in For-hybrid model. (Theorem 4) 


Security. Then, we prove the security of conv. 


Theorem 5. Tony is UC-secure in For-hybrid model. 
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Proof of Theorem 5. We construct a separate simulator for each party (Sọ for the 
Po’s view and S4 for the P,’s view, as in Definition 2). Consider the case that P} is 
corrupted. The view of P, can be written as: 


Views ° ([e]p, p') = (ltp r), 


where r is a randomness used by P;. We need to show that the simulator S; can generate 
the view of P}. In the protocol, P, receives an input consisting of values [z] p and p’. 


Then, S4 is given ({z];,, {2]},, p’) and works as follows: 


1. Sı chooses a uniform randomness r from Zp. 
2. Sı chooses uniformly distributed random [ft], from Zy. 
3. Sı outputs ([Z]},,7). 


Due to the security of the underlying additive secret sharing scheme, r and [t] a are 
uniformly at random in Zy in the real execution. Thus, the distribution of [t] z output 
by Sı is equal to one which are given for P. Furthermore, due to the correctness of 
IIpxp (shown in Theorem 4), the output of Tconv([2]p,p’) is equal to the output of 
functionality Foonv([Z]p, p’). Hence, we have 


{(Si(P', le] lp), Foonv(e]p,p)} = {(Views!°™ ([a]p,p"), Hoonv ([2]p,P')} 


Similar with above, in the case that Po is corrupted we can also construct Sg which can 
simulate the view of Po. Therefore, II Conv securely computes Foony in (Fim, FConv )- 
hybrid model. Moreover, from Theorem 1, IEconv is also UC-secure in For-hybrid 
model, and thus Theorem 5 holds. (Theorem 5) 


Efficiency of Our Modulus Conversion Protocol. Here, we give an analysis of the 
efficiency (round complexity and the number of invocations of multiplication) of our 
modulus conversion protocol IIgony. The round complexity is ror, where ror repre- 
sents the round complexity of For. Also, the number of invocations of multiplication 
is 7qT, Where ior represents the number of invocations of multiplication of For. 

As mentioned in Sect. 3.3, both of the round complexity ror and the number of 
invocations of multiplication ior of our constrained QT protocol are 1. Thus, our 
modulus conversion protocol requires only 1 round and 1 invocation of multiplication. 
Compared to the most efficient modulus conversion protocol [KIM+18] which requires 
O(log p’) rounds and O(log p’) invocations of multiplication, we can see that our mod- 
ulus conversion protocol is more efficient. 


3.3 A Constrained Quotient Transfer Protocol Without Bit-Decomposition 


In this section, we provide our constrained QT protocol without the bit-decomposition 
protocol. Our core idea is that we can check whether [a]? + [x]} is bigger than p or 
not by using only the least significant bit (LSB) of the shares of even inputs. Namely, 
our QT protocol securely computes For if the input is an even number. Note that this 
restriction does not occur any problem when used in our exponentiation protocol. More 
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specifically, the input of our exponentiation protocol is not restricted to even numbers. 
See Sect. 3.4 for the details. 

We propose our constrained QT protocol as described in Algorithm 3. Let p and p’ 
be odd primes. Regarding an input for Algorithm 3, let x be an even and b; the LSB of 
[x]j, for i € {0, 1}. 


Algorithm 3. Our Constrained Quotient Transfer Protocol Ior 


Input: [x]p, p 
Output: [t] 

1: Each P;(i € {0,1}) locally computes b; = LSB((z]}). 

2: Each P;(i € {0, 1}) locally sets [b;]*,, = b; and [bio =0. 
3: [tly = [bo] + [bı] — 2- [bo]y > [bı] 

4: Output [t] p 


Remark 3 (On an extension to n-party setting). We can easily extend our two-party QT 
protocol into n-party protocol by executing the LSB checking (Step 3 in Algorithm 3) 
n times for an input x for judging how many times x exceeds the underlying modulus p. 
However, this naive approach requires n invocations of multiplication and the resulting 
protocol is not efficient. It is an interesting open question to extend our QT protocol 
into n-party setting efficiently (which derives an efficient constant-round n-party MPC 
protocol for an exponentiation functionality based on additive secret sharing). 


Correctness. Here, we prove the correctness of IIgr. 


Theorem 6. Ifthe input x is even, Iior is correct in Fmu-hybrid model. 


Proof of Theorem 6. Since x is even and p is prime, the last bit of x must be 0 
and the last bit of p must be 1. Since [a]? + [c]} = «+t-p(t € {0,1}) holds, 
LSB((2]?)@LSB((z]),) have two cases. If [a]? + [x]; = x holds, then LSB((z]?)e 
LSB ([x]}) = 0 holds. Otherwise, LSB({2]?)®LSB(|2];) = 1 holds. Thus, the value 
of LSB({]?)®LSB((z]},) is equal to the value t. Moreover, due to the correctness of 
Ful, we can see that [bo]p + [bi]p — 2 + [bo]p - [bı] computes the shares of t = 
LSB((x]})® LSB([z]},) over Zy in Step 3. Therefore, if the input x is even, Ilor is 
correct in Fmui-hybrid model. (Theorem 6) 


Security. Then, we prove the security of Ior. 


Theorem 7. Ilor is UC-secure in Fmu-hybrid model. 


Proof of Theorem 7. We construct a separate simulator for each party (So for the Po’s 
view and_S} for the P,’s view, as in Definition 2). Consider the case that P, is corrupted. 
The view of P, can be written as: 


Views °” ([æ]p) = ([elb.r), 
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where [cp = [bi]p > [bolp:, bi = LSB((a]},) for i € {0,1}, [bi]}, = b; and [b;]>,* = 0 
for i € {0,1}, and r is a randomness used by P}. 

We need to show that a simulator can generate the view of the P,. In the protocol, 
P, receives an input consisting of values ([x]}, p'). Then, S4 is given ((z];,, [t]; p’) and 
works as follows: 


1. Sı chooses a uniform randomness r from Zy. 

2. Sı generates bı = LSB((z]+). 

3. Sı sets [bi]}, = bı and [bo]; = 0. 

4. Sı computes [c]; = ([b1]5 + [bold — [t]p-) 27t. 
5. Sı outputs [dp and r. 

Since the randomness r is uniformly at random in Z, and the share [c] y is distributed 
randomly conditioned on [t], = [bo], + [bi], — 2- [c]; which are same as in the real 
execution. Thus, the distribution of ( [c] H , r) output by S; is equal to one which are given 
for Pı. Furthermore, due to the correctness of IIg7 (shown in Theorem 6), the output 
of IIgr([z]p, p") is equal to the output of functionality Far ((z]p, p’). Hence, we have 


{(Si(la]p, (pe), For(lalpp')} © {View "(lel p’), Hor ([e]p,p')}. 


Similar with above, in the case that Pp is corrupted, we can also construct Sp which 
can simulate the view of Po. 

Therefore, IIqr securely computes For in Fmu-hybrid model. Moreover, from 
Theorem 1, Ilor is also UC-secure in Fmui-hybrid model, and thus Theorem 7 holds. 
(Theorem 7) 


Efficiency of Our Constrained QT Protocol. Here, we give an analysis of the effi- 
ciency (round complexity and the number of invocations of multiplication) of our con- 
strained QT protocol Ilor. The round complexity is rmui, where rmui represents the 
round complexity of Fy. Also, the number of invocations of multiplication is imul, 
where imu represents the number of invocations of multiplication of Fyyy. 


3.4 A Concrete Protocol in Our Framework 


In this section, we propose our concrete protocol II},yp based on our framework IIpxp 


for the exponentiation functionality Fgxp. Here, Igxp is realized by the concrete 
Tcony in Sect. 3.2 using our constrained QT protocol ITgr in Sect. 3.3, and we denote 
this protocol II}, p which is described in Algorithm 4. In the following, let a € Zp and 
x € Zp. As mentioned in Sect. 1.2, since we need to ensure the inputs are always even, 
our concrete exponentiation protocol requires a condition 2x < p for inputs x. We note 
that regarding the output [o]; in Algorithm 4, if we want to convert it back to shares in 
Zp (as opposed to in Z), we just need to apply modulus conversion to it again. 


Remark 4 (On the selection of modulus p'). In Algorithm 4, we assume that a prime 
p' is known such that there exists an element b satisfying b = vya in Zp. Note that 
for a prime p, half of elements in Zp have square roots in Zp, and if this is the case 
for a, the element b € Z, can be found via standard algorithms. In other words, for 
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Algorithm 4. Our concrete exponentiation protocol IIgxp 


Input: a, [z],, p" 

Output: [0]; 

: b:= ya, where b € Zy 

: [22], — 2[z]p 

if p 4 p' then 
[22] — Tconv([22]p, p") 
v = [2a], 

else 
v= [27]p 

: end if 

: Output [o], — Iexr (b, v) 


O NI OA e pi B 


a randomly chosen a, the probability that setting p’ = p is sufficient, where p is the 
prime underlying the additive sharing of x, is 1/2. However, if no square root for a 
exists in Zp, a different prime p’ must be used. An appropriate p’ might be found simply 
by trying a random prime p’, test whether a has a square root in Zp, and if not, try 
a new random prime p’. Under the assumption that an element a € Zp has a square 
root in Zy with probability 1/2 for a randomly chosen p’, this approach will efficiently 
find an appropriate p’ with overwhelming probability. We note that since a is assumed 
to be public, finding an appropriate p’ can be done before the exponentiation protocol 
is executed, and might be based on publicly available information for commonly used 
values of a and p. 


Correctness. Here, we prove the correctness of our protocol II,xp. 


Theorem 8. Teyp correctly computes Fexr in (Far, Foonv, FMu)-hybrid model if 
2x < p' holds. 


Before showing our formal proof, we give some subtle points which happens in the 
proof. The difference with the protocol IIgxp is that, x is extended to 2x in Mhyp 
since IIor can only work over even numbers. (Here, since we need to compute 2x 
exactly without reducing in p’, 2x < p’ is required.) Thus, we compute ya” instead 
of a” directly, where [2x], + [2x]}, = 2a + t- p' (t € {0,1}). In the following, it is 
confirmed that both in the two cases, we can obtain correct result a”mod p in the end 
of Ihxp- 


Proof of Theorem 8. Here, we consider the case that p # p’ holds. (In the case of 
p = p’, we just need to skip the process of TIcony.) First, IL Cony changes 2x € Zp to 
2x € Zp. In Woony on input 2x, from the correctness of IIgr and the fact that the 
input 2x is an even number, we can get a correct value t € {0,1} satisfying [22]?, + 
[2x5 = 2a + t- p’. From the correctness of II conv, we obtain [2]?, and [2a]/,, where 
22] 


[2x], + [2x]; = 2x mod p’. Moreover, each party P; computes y; = b > mod pand 
shares y; to each other, where v= a( mod p). Thus, from the correctness of Igxp and 


0 1 
the correctness of IIor, we can obtain a (correct) output o = pl +e] mod p= 


a*mod p. (Theorem 8) 
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Security. Finally, we prove the security of IIjyp. 


Theorem 9. IIj,yp securely computes Fexp in (Far, Foonv; Fmui)-hybrid model. 


Proof of Theorem 9. From the UC-security of the underlying Icon, (Theorem 5), 
Ilor (Theorem 7), and the underlying multiplication protocol, we can easily see that 
IInxp securely computes Fpxp. (Theorem 9) 


Efficiency of Our Concrete Exponentiation Protocol. Here, we give an analysis of 
the efficiency (round complexity and the number of invocations of multiplication) of 
our concrete exponentiation protocol II,.yp. 

Firstly, we estimate the round complexity of Ițxp. Recall that the round complex- 
ity of our exponentiation framework, modulus conversion protocol, and constrained QT 
protocol is max(rQT, Mul) + TMu Ter, and ryu respectively, where ror and rmui 
represent the round complexity of For and Fmuı respectively. That is, instantiating 
our exponentiation framework and modulus conversion protocol by our constrained QT 
protocol, the round complexity is 2 - rmul and rmui, respectively. In the case that we 
need to convert the modulus p to another p’, since II,xp calls one modulus conver- 
sion protocol and one exponentiation framework, the round complexity of our concrete 
exponentiation protocol is 3 - rMu1. In contrast, if we do not need to convert the mod- 
ulus p to another p’, since IMhyp calls only one exponentiation framework, the round 
complexity of our concrete exponentiation protocol is 2 - ry. By using a standard 
multiplication protocol, the round complexity of the multiplication protocol rmu is 1. 
Thus, the round complexity of our concrete exponentiation protocol is 3 in the former 
cast and 2 in the latter case. 

Secondly, we estimate the number of invocations of multiplications of II,yp. 
Recall that the number of invocations of multiplications of our exponentiation frame- 
work, modulus conversion protocol, and constrained QT protocol is igr + 2:tMul, tQT, 
and imul, respectively. That is, instantiating our exponentiation framework and modu- 
lus conversion protocol by our constrained QT protocol, the number of invocations of 
multiplications is 3 and 1, respectively. 


4 Conclusion 


In this paper, we give a new two-party exponentiation protocol based on an additive 
secret sharing scheme which is compatible with well-known dishonest-majority MPC 
frameworks. The efficiency of our protocol is characterized by two cases. If we need 
modulus conversion in our protocol, it requires 3 rounds and 4 invocations of MPC 
multiplication. In contrast, it requires only 2 rounds and 3 invocations of MPC mul- 
tiplication if we do not need modulus conversion. The core techniques for obtaining 
our protocol are two-fold. One is an efficient constrained quotient transfer protocol 
which only works on even numbers without bit-decomposition. The other is an efficient 
modulus conversion protocol based on the above efficient quotient transfer protocol. 
We believe that these two primitives might be of independent interest and could have 
further applications. We leave it as future work to construct efficient unconstrained QT 
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protocols that will remove the limitation 2x < p in our concrete protocol and investigate 
how to extend our protocol to support non-integer values such as fixed-point numbers. 
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Abstract. We present a novel compiler for transforming arbitrary, pas- 
sively secure MPC protocols into efficient protocols with covert secu- 
rity and public verifiability in the honest majority setting. Our compiler 
works for protocols with any number of parties > 2 and treats the pas- 
sively secure protocol in a black-box manner. 

In multi-party computation (MPC), covert security provides an 
attractive trade-off between the security of actively secure protocols and 
the efficiency of passively secure protocols. In this security notion, honest 
parties are only required to detect an active attack with some constant 
probability, referred to as the deterrence rate. Extending covert secu- 
rity with public verifiability additionally ensures that any party, even 
an external one not participating in the protocol, is able to identify the 
cheaters if an active attack has been detected. 

Recently, Faust et al. (EUROCRYPT 2021) and Scholl et al. (Pre- 
print 2021) introduced similar covert security compilers based on com- 
putationally expensive time-lock puzzles. At the cost of requiring an 
honest majority, our work avoids the use of time-lock puzzles completely. 
Instead, we adopt a much more efficient publicly verifiable secret shar- 
ing scheme to achieve a similar functionality. This obviates the need for 
a trusted setup and a general-purpose actively secure MPC protocol. 
We show that our computation and communication costs are orders of 
magnitude lower while achieving the same deterrence rate. 
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1 Introduction 


Multi-party computation (MPC) is a subfield of cryptography allowing a set 
of mutually distrusting parties to jointly compute functions over their inputs 
without revealing anything but the outcome of the computation. 

This way, nothing more can be deduced about the inputs of other parties 
than what could be deduced from the outcome of the computation alone. Tra- 
ditionally, two types of adversaries have been considered in MPC; passive and 
active adversaries. Passive adversaries try to deduce as much private information 
as possible but follow the protocol honestly. Active adversaries are additionally 
allowed to arbitrarily deviate from the protocol, which might also compromise 
the correctness of the outcome. In general, passively secure protocols are fast 
but might not be considered secure in many realistic scenarios unless there is a 
good reason to assume that an untrusted party will not deviate from the proto- 
col. Actively secure protocols are very secure in this regard, but active security 
comes at the cost of increasing the communication and computation complexity. 

As a trade-off between the benefits of these two notions, covert security 
was introduced by Aumann and Lindell in 2007 [3]. Instead of safeguarding the 
protocol against an active attack, the idea of this notion is that it is sufficient 
to only detect the attack with a certain probability called the deterrence rate e. 
Usually the deterrence rate can be chosen arbitrarily, thus providing a dynamic 
trade-off between the efficiency and security of passively and actively secure 
protocols, respectively. Goyal, Mohassel and Smith [13] presented a covertly 
secure version of garbled-circuit based MPC protocols [5] and Damgard et al. [9] 
introduced a cheap cut-and-choose approach for an efficient and covertly secure 
offline phase for the SPDZ protocol [11], replacing costly zero-knowledge proofs 
required for active security. While this notion has led to promising results, in 
2012 Asharov and Orlandi [2] observed that it might not be sufficient for practical 
applications. If a party detects a cheating attempt, there is in general no way 
of proving that another party has acted maliciously. Therefore they introduced 
the extended notion of publicly verifiable covert security. This property equips 
the parties with a mechanism to generate a certificate that proves a cheating 
attempt to anyone, including external parties not participating in the MPC 
protocol. Even though this notion looks promising for wider use in practice, 
relatively little research has been done in this area. The only concrete protocols 
in this security model have been presented in [2,15,16]. 

Another line of research is the trade-off between the number of corruptions a 
protocol can tolerate and efficiency, again giving up some security by tolerating 
less corruptions to achieve a more efficient protocol. A popular relaxation in 
literature is the assumption of an honest majority, meaning that more than half 
of the parties are guaranteed to behave honestly. Concrete protocols with active 
security and only sublinear overhead in the honest majority model have been 
presented in [6, 14]. 

To ease the development of MPC with stronger security guarantees, compil- 
ers were introduced. Compilers allow for a modular approach to cryptographic 
protocol design; they provide a generic transformation from protocols with cer- 
tain (security) properties to protocols with stronger properties. For instance, 
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covert /active security compilers take as input a passively secure MPC protocol 
and output a protocol with covert/active security. The focus of this work will be 
on compiling passively secure protocols into efficient protocols with covert secu- 
rity and public verifiability for any passively secure protocol with an arbitrary 
number of parties n > 2. 

Many MPC protocols proceed as follows. They first run an input indepen- 
dent pre-processing phase to set up some correlated randomness, e.g., Beaver 
triples [4]. Because this phase can be executed before the secret input values are 
available, it is also referred to as the offline phase. This pre-processing allows 
the actual computation, the online phase, to be executed very efficiently. Since 
actively secure online phases nowadays are quite efficient already, we specifically 
target our compiler towards the more expensive pre-processing protocols. As was 
proven in [10], combining a covertly secure pre-processing protocol with public 
verifiability and an actively secure online phase yields an overall protocol with 
covert security and public verifiability. Therefore, our compiler could for example 
be used to replace the actively secure pre-processing step of the SPDZ protocol 
with a covertly secure one from our compiler and combine it with the actively 
secure online phase of SPDZ [11] to improve the overall efficiency. 

Typically, covert security is obtained by a cut-and-choose strategy where the 
passively secure protocol is simply executed multiple times after which some of 
these executions are “opened” to verify the behavior of the parties. An important 
predicament to overcome for public verifiability is the prevention of a detection- 
dependent abort. This means that an adversary should not be able to prevent 
the generation of a certificate once it sees its cheating attempt is going to be 
detected. The first covert security compiler without public verifiability was pre- 
sented by Damgard, Geisler and Nielsen in 2010 [8]. Their approach is based 
on the assumption of an honest majority of participants. A covert security com- 
piler with public verifiability, secure against any number of corruptions, was first 
presented by Damgard et al. in 2020 [10]. They presented two compilers in the 
2-party case; one for input-independent protocols and one for input-dependent 
protocols. Furthermore, they sketch how to extend their approach to arbitrary 
numbers of parties. To prevent a detection-dependent abort, detecting active 
attacks is done by letting each party independently and obliviously choose which 
executions it wants to verify. To guarantee for a constant number of k executions 
that at least one execution remains closed, the number of executions that can 
be chosen by each party (and hence the deterrence rate) decreases for increasing 
numbers of parties. Concretely, each party can choose at most £1 executions 
and thus obtains € = k=, 

Constructions with a constant deterrence rate for any number of parties have 
been presented by Faust et al. [12] and concurrently by Scholl et al. [19]. Both 
works follow a shared coin toss (SCT) strategy. With this strategy, the parties 
together toss a coin to determine which executions will be verified by everyone 
guaranteeing maximal deterrence rates regardless of the amount of parties. To 
prevent a detection-dependent abort, both [12] and [19] use time-lock puzzles 
(TLP) to lock the potential evidence before the coin toss such that the honest 
parties are guaranteed its availability in case the adversary aborts after seeing 
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the outcome of the coin toss. A TLP hides a secret message and solving the 
TLP reveals this message. Moreover, solving a TLP is guaranteed to require a 
fixed amount of work. A TLP therefore guarantees that a message is hidden for 
a fixed amount of time and that it can be revealed after this fixed amount of 
time. 

However, the time-locks introduce strict timing assumptions which intro- 
duce subtle issues in practice when used for this application. The entire security 
against a detection-dependent abort in these works relies on the assumption 
that the TLP is hidden for a few synchronous communication rounds. In theory, 
the synchronous communication model ensures that the parties communicate 
in fixed rounds through a global clock. In practice, this is typically realized by 
picking a certain timeout after which all messages for a round should have been 
received. With the TLP approach, if the amount of work required for solving the 
TLP is picked too low, an adversary has a higher probability of solving the TLP 
early and perform a detection-dependent abort. On the other hand, by picking 
a larger amount of work, the complexity for the honest parties to solve the TLP 
becomes undesirably high. The TLPs only need to be solved in case of misbe- 
havior, so using an extremely complex puzzle could be acceptable to decrease 
the probability of the adversary solving the TLP too early. However, since we 
cannot make assumptions about the power of the adversary, it is still impossible 
to guarantee the security of the TLP and thus secrecy of the underlying message 
for a small number of communication rounds. 

Furthermore, both TLP approaches require the availability of a general- 
purpose, actively secure MPC protocol to realize a trusted setup and implement 
an ideal functionality that constructs the TLP. This seems counterintuitive in a 
setting where the goal is to increase the security of a passively secure protocol 
through compilation. Furthermore, these functionalities prove to be very costly. 


1.1 Contributions 


In this work, we introduce a novel and efficient covert security compiler with 
public verifiability in the honest majority setting. 

Our approach is based on the covert security compilers with public verifia- 
bility presented in [10,12,19]. We adapt their constructions and use a publicly 
verifiable secret sharing scheme (PVSS) to replace the costly time-lock puzzles 
(TLP). Compared to [10], our compilers yield much higher deterrence rates in 
the multi-party setting. This is achieved by following a shared coin toss (SCT) 
strategy, similar to the compilers of [12,19]. More precisely, for any number of 
executions of the passive protocol k, a deterrence rate of 1 — i can be achieved 
independent of the number of parties n. The public verifiability of the compilers 
of [12,19] is based on the use of TLPs to ensure availability of potential evidence 
after the coin toss. In contrast, we adopt a PVSS to distribute the evidence 
among all the parties. Due to the honest-majority assumption, the PVSS can 
be instantiated such that the adversary corrupting less than n/2 parties cannot 
reconstruct this secret evidence prematurely, while the honest parties are able 
to reconstruct. The prior works of [12,19] do however provide security against a 
dishonest majority. 
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With our adaptation, we remove the need for a trusted setup and an actively 
secure puzzle generation. We show that as a result, both the computation and 
communication complexity of our compiler decrease by multiple orders of mag- 
nitude. Moreover, an efficient and secure TLP instantiation for the purpose of 
achieving public verifiability, requires an accurate estimation of the adversary’s 
computational resources. Therefore, in this application, it is inherently difficult 
to instantiate a TLP appropriately. For these reasons, our approach, avoiding 
TLPs altogether, provides security against a more realistic adversary model. 

Our compiler makes black-box use of the passively secure protocol and can 
therefore enhance the security of any passively secure protocol, including future 
protocols. In [14] and [6], active security is obtained by adapting a specific secret- 
sharing based protocol and requires a stronger security notion than plain passive 
security. Therefore, these protocols are incomparable to our compiler. 


1.2 Technical Overview 


Covert Security. Covert security is obtained in a similar fashion to related con- 
structions, where active cheating is usually detected by some cut-and-choose 
mechanism. More precisely, the passively secure protocol is executed k times 
after which t < k executions are opened to verify the behavior of the parties. 
Opening an execution is done by revealing the randomness used by each party 
during an execution of the protocol. Note that in this work we are specifically 
targeting input-independent protocols and hence the behavior of a party is com- 
pletely determined by the (publicly known) protocol description and the ran- 
domness used. Given the randomness of the other parties, each party can replay 
the protocol execution and verify the behavior of the other parties during the 
actual protocol execution. If no deviations are detected, the result of one of the 
unopened executions can then be picked as the output of the protocol. However, 
this approach still allows a dishonest party to decide which randomness to reveal 
after learning which executions are to be opened, i.e., there is no guarantee that 
the revealed randomness was used during the executions. To prevent this, the 
parties are required to commit to their randomness before the protocol execu- 
tion. This technique was introduced by Hong et al. [15] and is also referred to 
as derandomization. After the k parallel executions, the parties perform a joint 
coin toss outputting an integer 1 <i < k indicating the protocol execution that 
is to be used as output. The remaining k — 1 executions are opened and the 
parties verify each other’s behavior. 


Public Verifiability. Public verifiability is obtained by making each party 
accountable for its messages by letting them sign all the messages they send 
during the protocol executions. If it is later detected that a party has sent an 
incorrect message, anyone can verify that this party must have sent the malicious 
message. It is essential to prevent a so-called detection-dependent abort, meaning 
an adversary cannot prevent the generation of a certificate once it sees it is going 
to be detected. To prevent this, we “lock” the randomness used by sharing it 
among all parties using a PVSS before the coin toss. If an adversary aborts after 
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the coin toss, the parties have enough shares to reconstruct the randomness and 
verify behavior anyways. Here the honest majority assumption is required to 
guarantee enough honest shares for reconstructing each randomness while the 
adversary cannot get hold of enough shares to reconstruct the randomness used 
in the output execution. 

Note that we can not simply accuse a party who aborts at this stage as 
there is no publicly verifiable evidence of this (such as a signature of a party 
on a malicious message). An external party who wishes to verify the protocol 
execution can not distinguish between an actual, active attack or an accident 
such as a network failure. Therefore, this straightforward approach could lead 
to an honest party unjustly being punished or is deniable by an adversary who 
can claim that he was not at fault. 


Public Verifiability from PVSS. By using a PVSS, the parties are guaranteed 
to be able to proof that an active attack occurred. To this end the parties can 
first use the PVSS to verify all the randomness shares distributed by each party 
before the coin toss. In case the verification of some share fails, anyone can verify 
that the adversary attempted to cheat by distributing inconsistent shares. If the 
verifications succeed, the shares are guaranteed to reconstruct to a well-defined 
value, namely the randomness of the distributor. The distributor is furthermore 
committed to this randomness by a proof of correct distribution. This verification 
can be performed without interaction with the distributor or any of the other 
parties. Therefore, verification can also be done by external parties, making it 
publicly verifiable. 

In case an adversary aborts after the coin toss, the parties can combine their 
shares to reconstruct the randomness of the adversary. During this reconstruction 
phase, each party is required to also publish a proof of correct decryption. This 
way, the honest parties can combine only correct shares to reconstruct the value 
originally distributed. Furthermore, an adversary cannot ‘incriminate’ an honest 
party by publishing a different share than distributed by the honest party. As 
an additional benefit of the PVSS strategy, the protocol can still continue and 
succeed in case an otherwise honest party is not able to deliver this information 
in time. 


2 Preliminaries 


Our compiler uses several building blocks. As cryptographic building blocks, 
the compiler uses a commitment scheme (Com, Open) and a signature scheme 
(Gen, Sign, Verify). Throughout this work, the commitment scheme is assumed 
to be non-interactive, but our compiler could trivially be instantiated with an 
interactive commitment scheme as well. Committing to a message m with ran- 
domness r will be denoted by (c,d) — Com(m; r), where c is the resulting com- 
mitment and d = (m,r) the opening information. Opening a commitment is 
then denoted with m’ — Open(c, d). For a correct opening, we get that m’ = m 
and m’ = L otherwise. The commitment scheme should satisfy the hiding and 
binding properties. 
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The signature scheme should be ezistentially unforgeable against chosen mes- 
sage attacks. Before the protocol execution, all parties are expected to generate 
a public-private key pair (pk, sk) using Gen and register their public key. Signing 
a message m using a private key sk is denoted as ø — Sign,,(m). Verify- 
ing a signature using the corresponding public key is denoted as accept, L — 
Verify, (m, o). 


2.1 Multi-party Computation 


The goal of Multi-Party Computation (MPC) protocols is to allow a group of 
n participants P = P,,...,P, to compute a shared function f over their pri- 
vate inputs 71,...,2, While keeping their inputs hidden from each other. This 
group of participants can be divided in two sets: honest participants and corrupt 
participants. The honest participants will strictly follow the protocol description 
while corrupt participants are assumed to be under the influence of a central 
adversary. 

In general for an MPC protocol to be considered secure, it needs to satisfy 
two requirements: privacy and correctness. Privacy means that an adversary is 
not able to learn more than what it can deduce from its own inputs and the 
output of the protocol. Particularly the adversary should not be able to gain 
any additional information about the inputs of the honest parties. Correctness 
means that the outcome of the protocol received by the honest parties should 
be correct. To reason about the security of such a protocol in the presence of 
an adversary, we follow the standard real/ideal world paradigm to show that 
our protocol in the real world is indistinguishable from an ideal execution of the 
same functionality. Informally, this paradigm specifies an ideal Functionality F 
and proofs that the MPC protocol I implements exactly this ideal execution 
and is thus as secure as the ideal world. 

In this work we assume that the adversary A with auxiliary input z can 
statically corrupt a set A C P of the parties with |A| < $. Furthermore, let 
IT : ({0,1}*)” — ({0,1}*)” be the real-world protocol computing functionality 
f taking one input per party {x1,22,...,%,} = T and returning one output to 
each party. We define the outputs of the honest parties and A in a real-world 
execution of IT as REAL)|[A(z), A, H, x], where A is the security parameter. 

In the passive security model, an ideal world adversary S is assumed to try 
to deduce as much information as possible while honestly participating in the 
protocol. On the other hand, active adversaries may arbitrarily deviate from the 
protocol in order to try to deduce more information or break the correctness of 
the outcome. 


Covert Security. The idea of covert security is to assume an adversary who is 
capable of performing an active attack, but a certain probability of being caught 
cheating is enough to refrain him from doing so. This probability of being caught 
is called the deterrence rate e. 

In this work, we follow the strongest definition for covert security originally 
defined by Aumann and Lindell [3] called strong explicit cheat (SECF). The ideal 
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functionality for calculating a function f in the presence of covert adversaries 
according to this definition will be called Foovert. This functionality allows S 
to perform cheating like an active adversary. With a probability of €, Foovert 
informs all the parties of a cheating attempt. With a probability of 1 — €e, a 
cheating attempt is successful in which case S learns the inputs of all the parties 
and may decide their outputs. For readability of our protocols, we slightly alter 
the original SECF definition to not require identifiable abort. Due to the honest 
majority assumption this can, however, easily be obtained by adding a byzantine 
agreement at the end of the protocol. The formal definition of Feovert can be 
found in the full version of this paper [1]. 

The joint distribution of the outputs of the honest parties and the ideal-world 
adversary S (with auxiliary input z) is denoted as IDEALS[S(z), A, Feovert; Z]. 
Covert security can now be defined as follows, where £ denotes computationally 
indistinguishable: 


Definition 1 (Covert security with deterrence rate €). A protocol IT 
securely computes Feovert with deterrence rate € if for every real-world adversary 
A, we can find an ideal-world adversary S such that for all security parameters 
AEN: 


€ = c — 
{IDEALS [S (2), A, Feoverts Fhe reto, = {REALIA(©), A, I, TFs ketony - 


Public Verifiability. As an extension to covert security, the notion of pub- 
licly verifiable covert security (PVC) was proposed by Asharov and Orlandi in 
2012 [2]. This form of security provides the parties with a mechanism to gener- 
ate a publicly verifiable certificate in case cheating is detected. This certificate 
proves to anyone that a certain party attempted to cheat during the protocol. 

We use the approach of [15] where a Judge algorithm is added to a real-world 
protocol JI. If, in the execution of I, cheating is detected, the protocol outputs 
a certificate cert. The Judge algorithm verifies this certificate and outputs the 
public key (the “identity” ) of the cheater if it is valid. The vector of public keys 
is defined as pk = (pk',...,pk"), corresponding to the P;s. Furthermore, we 
have extracted the verification procedure of the protocol to a separate Blame 
algorithm. Blame takes the view of a party P;, returns a certificate cert and 
outputs corrupted, in case party P; is found to be cheating. Formally, we define 
covert security with public verifiability as: 


Definition 2 (Covert security with deterrence rate € and public verifi- 
ability). A protocol (IT, Blame, Judge) securely computes Feovert with a deter- 
rence rate of € and public verifiability if the following three conditions hold: 


- Covert security: IT is secure against a covert adversary according to Defi- 
nition 1 for covert security with deterrence rate e. Additionally, II might now 
output cert in case cheating is detected. 
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- Public Verifiability: If an honest party P; detects cheating by another party 
Pj and outputs cert in an execution of IT, then Judge(pk,F, cert) = pki 
except with negligible probability. 

- Defamation-Freeness: If party P; is honest and executes IIT in the pres- 
ence of an adversary A, then the probability that A creates cert* such that 
Judge(pk, F, cert*) = pk’ is negligible. 


2.2 Publicly Verifiable Secret Sharing 


Verifiable secret sharing (VSS) [7] is an extension of regular secret-sharing that 
provides additional security against active attacks. VSS protects honest par- 
ties against malicious participants by equipping the secret sharing scheme with 
mechanisms to (i) verify that they received consistent shares from an untrusted 
dealer and (ii) verify that they received the correct shares from the other parties 
during reconstruction. With publicly verifiable secret sharing (PVSS) [18,21], 
properties (i) and (ii) can be verified by anyone, also parties outside the secret 
sharing protocol, without any interaction. In general, a PVSS can be instantiated 
from any secret sharing scheme with an arbitrary access structure A. For this 
work, a threshold access structure such as realized with Shamir’s secret sharing 
scheme [20] is sufficient. In this work, we require the PVSS to satisfy the defi- 
nition first presented by Schoenmakers [18], which adds an additional proof of 
correct decryption: 


Definition 3 (PVSS Scheme). A PVSS scheme with a set of players P and 
access structure A C P consists of the following three algorithms: 


- (Ei(si);ep, dproof) — Distribute(s): The distribution algorithm takes as 
input a secret s and publishes a set of encrypted shares E;(s;),-p and some 
public distribution proof dproof. 

- true or L — Verify(dproof, E;(s;)): The verification algorithm takes as 
input a distribution proof and an encrypted share E;(s;) and outputs true if 
E;(s;) encrypts a valid share s; of s according to dproof. 

- s! — Reconstruct({rproof,, sitica): The reconstruction algorithm takes a 
set of decrypted shares s;,i E€ A and corresponding decryption proofs of some 
subset A C P and outputs the reconstructed value s'. In case A € A, we call 
A a qualified subset and as a result, s! = s if the verifications of the encrypted 
shares succeeded according to the proofs. 


Here, it is assumed that we already have a registered public key of all the 
participants. Instead of generating and distributing the secrets directly, a dealer 
publishes encrypted shares E;(s;) with the known public keys of each party 
P;. Furthermore, the dealer publishes a string dproof which shows that each 
Ej encrypts a consistent share sj. This proof also commits the dealer to the 
value of the secret s and guarantees that no one can wrongly claim to have 
received a wrong share since anyone can verify this. In this work we will abuse 
notation and let Verify(dproof, E;(s;)pep) denote the verification of all shares 
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destined for the parties in P of the same secret s. Now, true is interpreted as 
all verifications succeeding while L means at least one verification failed. If the 
reconstruction succeeds, we are guaranteed that this is the original secret s. 
During the reconstruction phase, the parties decrypt and publish their shares s; 
from E;(s;) along with a string rproof, which shows that they performed the 
decryption correctly. Using these, the other parties can now exclude the shares 
of participants who failed to decrypt correctly. If enough decryptions (+1) pass 
the verification, the parties can reconstruct the original secret successfully. 

We require the PVSS to satisfy the correctness, soundness and privacy secu- 
rity guarantees. 


Definition 4 (Correctness). If a dealer honestly follows the Distribute 
algorithm to publish the encrypted shares E;(s;)iep and a public proof dproof, 
then the outcome of Verify(dproof, E;(s;)) is guaranteed to be true. Further- 
more, if during reconstruction a party P; honestly decrypts E;(s;), publishes its 
share s; and honestly generates the proof rproof,, then another honest party 
receiving the decrypted share s; and rproof, accepts this share. Finally, a qual- 
ified subset A C P is guaranteed to reconstruct the original secret s if the dealer 
and the parties in A honestly follow the Distribute and Reconstruct protocols. 


Definition 5 (Soundness). If Verify(dproof, E;(s;)) == true, then for all 
qualified subsets A1, A2 C P, the following holds: 


Reconstruct({rproof,, si}ic4,) == Reconstruct({rproof;, 8i}icA,)- 


Furthermore, if a malicious party submits a fake share during reconstruction, 
verification of this share fails with an overwhelming probability. 


Definition 6 (Privacy). An adversary corrupting a set of participants A such 
that |A| < t should not be able to learn anything about the secret s from the shares 
s; withic A. 


3 Building Blocks 


In this section, we will introduce the basic building blocks of our PVC com- 
piler. The PVC compiler uses a public bulletin board and a public coin tossing 
functionality. Furthermore, our compiler slightly modifies the passively secure 
protocol execution. 


Public Bulletin Board. For public communication required by the PVSS, we 
model an ideal functionality Fpp, which represents a public bulletin board. A 
formal description of this Functionality can be found in the full version of this 
paper [1]. The public bulletin board functionality guarantees that the honest par- 
ties agree on all the messages that have been sent. In practice, this functionality 
could be realized using the echo broadcast protocol of [17]. 
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Feoin Ideal coin-tossing functionality 
— Consider a number of parties P;, Po,..., Pn 
— If Fooin receives a message (flip) from P;, it stores (flip, Pi) in memory if it is 
not stored in memory yet. 
— Once Feoin has stored all the messages (flip, P;) for i € [n], Feoin picks a random 
value r Er {0,1}* and sends (flip,r) to all the parties. 


Coin Tossing. An ideal functionality F.oin receives ok; from each party P;,i € [n] 
and outputs a random A-bit string r to all the parties. The adversary should not 
be able to influence the outcome of the coin-tossing protocol. Therefore, we 
require a coin-tossing protocol with security against an active adversary A. 


Passively Secure Protocol. The compiler presented in this work is designed to 
compile an arbitrary input-independent protocol [pass with passive security. 
Furthermore, we require the parties to agree on a public transcript that is the 
same in case of an honest execution, to compare to expected executions later 
on. To obtain such transcripts, we assume a fixed ordering in the messages and 
that every party can see each message sent during an execution of the protocol. 
In case ITpass is secure against n — 1 corruptions, we can simply broadcast every 
message since the adversary was allowed to see each message anyways. Otherwise, 
we need to keep the messages hidden by broadcasting symmetric-key encrypted 
messages instead, as presented in [10]. To ease notation, we will assume pass 
to be secure against n — 1 corruptions but adding symmetric-key encryption for 
an arbitrary number of corruptions could be realized in a straightforward way 
by simply opening the keys in the execution opening protocol as well. 


4 PVC Compiler 


In this section we will present the main compiler [comp for transforming an arbi- 
trary n-party MPC protocol [ass with passive security and no private inputs 
into an n-party MPC protocol with covert security and public verifiability. This 
compiler uses a commitment scheme, a signature scheme, a publicly-verifiable 
secret sharing scheme (PVSS) and an actively secure coin tossing protocol. We 
assume that every party already has registered a public key at the start of the 
protocol. Roughly speaking, Meomp works in four separate phases: seed genera- 
tion, protocol execution, evidence creation and execution opening and verification. 
In the seed generation phase, the parties set up k seeds from which they derive 
their randomness during the k executions of pass. In the protocol execution 
phase, ITpass is executed k times. In the evidence creation phase, the parties use 
the PVSS to secret share their seed openings to all the other parties and sign 
the information so that they can be held accountable later on. Finally, in the 
execution opening and verification phase, the parties toss a coin to select k — 1 
executions, open the randomness seeds for these k — 1 executions and verify 
the behavior of the other parties. If no cheating is detected, the parties output 
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ITseea Seed generation procedure 


This protocol works with an arbitrary number n of parties P = {P}, P2,..., Pa}. To 
generate uniformly random seeds for every party, the parties execute the following 
steps: 


1. Party P; samples uniformly random a private seed seed) iy, generates (c, d’) — 
Com(seed,,,i,) and sends c’ to the other parties. 
(ii) 


(pub) and 


2. For each j € [n], P; samples a uniformly random public seed share seed 

(ii) 

(pub) , rere 

Each party calculates the public seeds seedpu» for each P; as Dj-0 seed (2) 

4. If the parties have not received all the expected messages before some predefined 
timeout, the parties send abort to all the other parties and output abort. Other- 


wise, P; outputs (seed},;,,d’, {seed ,,c7}}je{n))- 


sends seed to all the other parties. 


w 


their output in the unopened execution. Otherwise, they output the obtained 
certificate. A formal description of eomp can be found in the full version of this 


paper [1]. 


Seed Generation. In order to guarantee covert security for any passively secure 
protocol pass, we need to be guaranteed that the used randomness is picked 
uniformly at random. To achieve this, we run an actively secure seed generation 
procedure Tseea for each of the executions of pass- A formal description of this 
procedure can be found in ITgceq. 

In the seed generation procedure, each party P; picks a private seed seedi piv 
for itself and publicly commits to this seed. Together all the parties generate a 
public seed for each party P; by first picking a public seed share and defining 
the public seed seedi up for P; as the sum of the shares of all the parties. During 
the executions of pass, the parties derive randomness from a seed that is the 
XOR of the private and public seed, and is thus uniformly random. 


Protocol Execution. In the protocol execution phase, the parties run the passively 
secure protocol k times in parallel and obtain an output y; and transcript trans; 
for each of the executions. In these executions, every party sends each message 


to every other party and signs each message to hold them accountable. 


Evidence Creation. In the evidence creation phase, the parties are required to 
generate publicly verifiable, encrypted shares for the opening information of all 
of the k randomness seeds used: 


({En(dn)@ } reins dproof’) — PVSS.Distribute(di). 


and publicly broadcast these using Fp». This ensures availability of all the used 
randomness seeds after the coin toss for verification by the honest parties. In case 
the adversary aborts after seeing the coin toss, the honest parties can reconstruct 
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its seeds using these shares, which are guaranteed to be correct if the PVSS 
verifications succeed. Furthermore, the parties sign the tuple: 


. € . l $ 
evidence; = (i, j, {c}, dproof; }iejn], trans;) 


With which they can be held accountable later on. 
In the next sections, we will explain the subprotocols Hopen, Hreconstruct and 
Blame executed in the execution opening and verification phase. 


4.1 Execution Opening 


After executing k parallel instantiations of the passively secure MPC protocol, 
the parties will run open to open the seeds used in k — 1 of these executions. 
Before IZ pen is executed, the parties have already published encrypted shares 
of the opening information of all of their seeds. In Meopen, the parties then verify 
these encrypted shares using the PVSS. If a verification fails, the parties generate 
a certificate and abort. If all verifications succeed, the parties jointly toss a coin 
to select the executions to open. At this point, it is too late for an adversary to 
abort since its seed openings have already been correctly distributed. Now, either 
the parties simply open all the seeds used in these executions (the optimistic 
case) or engage in reconstruct to reconstruct missing seeds (the pessimistic case). 
Note that we cannot simply indicate parties who fail to open their seeds as 
malicious since we are not able to generate a publicly verifiable certificate of this 
as an external judge is not able to distinguish between an active attack or an 
accidental abort. As an additional benefit, the PVSS strategy gives us a form of 
fault-tolerance. By being able to verify the executions in case of an abort, the 
protocol can still continue and succeed in case an otherwise honest party was 
accidentally not able to deliver this information in time. 


4.2 Seed Reconstruction 


If reconstructions are required, the parties engage in an execution of reconstruct: 
This protocol starts by the parties announcing to everyone which seeds they are 
missing. For every missing message received, the parties decrypt their own share 
of the published share encryptions of the corresponding seed opening. This share 
together with a publicly verifiable proof of correct decryption is then published 
on the bulletin board. Using these proofs, these parties can then pool together 
t+1 shares of which the proofs are valid to reconstruct the correct seed opening. 
Due to the honest majority, we have that ¢ < 5, which guarantees that the 
honest parties can reconstruct missing seed openings in case an adversary refuses 
to distribute them. Finally the parties output a complete set of all seed openings 
D,;. Note that in an honest execution of the PVC protocol, the parties already 
have a complete set after [open and thus reconstruct can be skipped. Using all 
the seeds, the parties can now verify the behavior of all the other parties in 
the opened executions. This procedure has been extracted to a separate Blame 
algorithm. 
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IIopen Protocol for opening a set of executions 


At the start of the protocol, all the parties know the encrypted seed shares {Ep (dr) ©} 
of every party P;,i € [n] in every execution j € [k] for every party Pp,h € [n] as well 
as the corresponding proofs dproof’. Furthermore, the parties have the signatures a} 
together with corresponding evidence tuples evidence;. Finally, each party P; holds 
a set of private seed openings {d}, dż, ..., di}, a set of outputs {yi,y%,..., yi} anda 
set of transcripts {trans}, trans},..., transi}. To open k — 1 protocol executions, do 
the following: 


Share Verification: 


1. First, the parties use the Verify algorithm of the PVSS to check the validity of 
all the shares to generate the set: 


M= { (i,m) € ({n], [k]) : PVSS.Verify(dprooft, , En (da) eer) = 1} ; 
If any of the parties obtain M # Ø, choose the tuple (l,m) € M with minimal l 
and m, calculate the certificate certinvs = (pki, evidencem, Ej (a E om) and 
output corrupted). 


Joint Coin Tossing Phase: 


2. Ifall the verifications succeed, each party P; sends (flip) to Feoin, receives (flip, r) 
and calculates the joint coin toss as coin =r mod k. 

3. Now, the parties exchange the set of seeds they have used in the k — 1 executions 
according to the coin toss such that each party P; obtains: 


D; = {a} : h € [n], j € [k] \ coin} 


Optimistic case: Each party P; generates 5 — Sign(d;) for all of its seed 
openings {d5}j¢[x]\coin and sends (¢;,d;) to all the other parties. Each party 
P; verifies the signatures and constructs D;. 

Pessimistic case: If a number of parties P; fails to publish their seed shares 
and/or valid signatures within a given amount of time, the parties engage in 
an execution of reconstruct to obtain D;. 


Output: 


4. Finally, each party outputs (D;, coin). 


4.3 Blame Algorithm 


In the Blame algorithm, the behavior of the parties is verified and a certificate is 
generated in case cheating was detected. This Blame algorithm takes the view of 
a party as input. First, the Blame algorithm verifies the seed openings of all the 
parties. If the seed opening was obtained via reconstruction, an invalid opening 
(1) certificate is returned. In case the seed opening was given directly by the 
adversary, and invalid opening (2) certificate is generated. To ensure the parties 
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Protocol Ireconstruct 


At the start of the protocol, the encrypted seed openings, shares { Æ» (dn) ©} of every 
party P;,i € [n] in every execution j € [k] meant for every party Ph, h € [n] as well as 
the corresponding proof strings dproofí are publicly known. The parties recover the 
seed openings they are missing in the following way: 


Missing seeds announcement: 


1. Each party P; starts with a (non-complete) set of seed openings D;. Assume P; 
did not receive the seed openings dl, of some party P, in some execution m. Call 
the set of tuples (l, m) of missing seed openings €;. 

2. For every tuple (l, m) € E;, P; sends a message missing(, m) to all the other parties. 


Missing seed reconstruction: 


3. For every missing, m) message received by P;, P; performs the following steps: 
— If m == coin, skip this message. 
— Otherwise, P; decrypts its corresponding share d; from Ei(di)%™, computes 
the string rproof( m) and sends (send, (di, rproof(, ,,)); 4) to Fp». 
4. For every tuple (l,m) € &, Pi does the following: 
— For every message received from Fp» of the form ((d;, rproof(, m) j), P; ver- 
ifies the rproofi, m): 
— Once t + 1 of the received proofs are successfully verified, P; reconstructs the 
seed opening dis from the t + 1 shares and adds this to Di. 


Output: 


5. Finally, P; outputs the set of seed openings D;. 


agree on which party cheated, the one with the lowest party- and execution id is 
picked. If all the seeds can be opened correctly, the Blame algorithm simulates 
the executions using the randomness seeds obtained in the previous step, result- 
ing in expected transcripts. If for any execution the actual transcript does not 
match with the expected transcript, the first party deviating from the protocol 
is identified and a deviation certificate is generated. 


4.4 Judge Algorithm 


The Judge algorithm takes a certificate and verifies it to confirm that the accused 
party actually cheated. If the verification succeeds, the public key of the cheater 
is output and otherwise L is outputted. This algorithm does not require any 
communication with the parties and can thus be run by third parties as well. We 
assume the judge has access to the messages publicly stored via Fpp. The judge 
performs a number of steps depending on the certificate type. If the certificate 
does not match any of the four templates, L is returned. Regardless of which 
certificate type it receives, it first verifies the signature of the accused party 
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Algorithm Blame(view) 


The Blame algorithm takes as input the view view of a party, which consists of: 


Public coin coin 

All the seed commitments and openings {ci, dj }ie{n},je[h]\coin 
Encrypted seed shares {En (dn) Yn icin] jelk] 

The set € of tuples of seed openings obtained via reconstruction 


PVSS proofs for distribution {dproof§ }iein] jelk] and reconstruction 
{rproof; m) }jeln], (i, m)eE 


Public keys {pkj}jeļn], signatures {0} }ieln] jel] and {45 Jien] jelk] 
Additional information {evidencej }jețx] 


To verify the behavior of the parties, do: 


1. 


Open the private seeds of all the parties P;,i € [n] in each execution j € [k] \ coin 
as seed’, priv) — Open(c', di). 
Construct the set S = {(1,m) € ([n], [k] \ coin) : seedi, priv) == L}. If S is not 
empty, pick the tuple (l,m) with the lowest l, m and produce an invalid opening 
certificate: 

— If (I,m) € E: set 

certinyvor = (pki, evidence, {d;, rproof!, m)}jeln] {E; (di) etn}, Gi): 

— Otherwise: set certinvo2 = (pki, evidencem, dhn, Øn, oh) 
And output (l, certinyo(1/2)): 
If all the verifications succeeded, set seed; = seed(, priv) 6 seed(; pub): As the 
randomness seed of each party P; in each execution j € [k]. 
Re-run each execution j of [pass for j € [k] \ coin by simulating party P; using 
random seed seed’ to obtain each transcript trans}. 
Using evidence,;, construct the set S = {m : trans, 4 transh, }. If S is not empty, 
pick the lowest m and find the party P, that sends the first message in trans 
which is inconsistent with the expected message from transi, and construct a 
protocol deviation certificate 


certdey = (pk,, evidencem, {din hien] oh), 


and output (l, certaev). Otherwise, output (-, L). 


on the evidence. If this signature is invalid, we can never be sure that the 
information was communicated by the accused party and thus L is returned. 


4.5 Security 


To prove that the compiler presented above satisfies Definition 2 for covert secu- 
rity with public verifiability, we first state the guarantees in Theorem 1 and 
then prove that our compiler satisfies the requirements of covert security (with 
deterrence rate €), public verifiability and defamation-freeness separately. 
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Algorithm Judge(cert) 


We assume the judge knows the function [pass to be computed. To check a certificate, 
do: 


— If Verify(evidence,,,o!,) = L, output L. 
— Else, interpret evidence as (i,m, {seed(n pub): Cm proof), Jef], trans;,). 


Depending on the type of certificate, do: 


invs: 
— certinvs = (pki, evidencem, Ej (ai eyo sone): 
— If PVSS.Verify(dproof!, Ej aye) = L, output pk,. Otherwise, output L. 


invol: 
= certinvol = (pki, evidencem, {d;, rproof,, m) jen {E; ger? }ieln]s Gin): 
(l,m) 


— If PVSS.Verify(dproof!,, Ej (dj) ;é1n)) = L, output L. 
— Verify t+ 1 of the rproofi, m)’S and use the corresponding d;’s to reconstruct 


dl„. If no t + 1 valid shares are available, output L. 
— If Open(cl,,d!,) Æ L, output L. Otherwise, output pk. 
invo2: 
— certinvor = (pki, evidencem, dhn, dn, on). 
— If Verify pu (dm: #m) = L, output L. 
— If Open(ch, din) Æ L, output L. Otherwise, output pk;. 
dev: 
— certdev = (pk), evidencem, {d} Jieln], jelk] coin Om): =) 
— For every party P; and execution m, open seed(, priv) — Open(Cm,dm) and 
calculate seed’, = seed(n priv) ® seed (in pub): 
— Re-run execution m of [pass by simulating each party P; using random seed 
seed’, to obtain transcript trans’. 
— If trans), == trans,,, output L. 
— If the first party that sends an incorrect message in trans/,„ is indeed P;, output 
pk,. Otherwise, output L. 
Otherwise: 
— If the certificate does not match any of the four formats, output L. 


Theorem 1. Suppose the PVSS (Distribute, Verify, Reconstruct) satisfies 
the privacy, correctness and soundness properties with a threshold t < n/2. Fur- 
thermore, assume the commitment scheme (Com, Open, Verify) is binding and 
hiding. Let the signature scheme (Gen, Sign, Verify) be existentially unforgeable 
under chosen plaintext attacks. Finally, assume IT coin implements F coin with active 
security. If IIpass is passively secure, the compiler COMPpyo = (IT comp, Hopen, 
Tyeconstruct) with the additional algorithms Blame and Judge is covertly secure with 
public verifiability against t < 5 corruptions with deterrence rate € = 1 — L, 
Intuitively, an adversary can try to cheat in a number of ways in the resulting 
protocol ITpyc. First, it can do so by causing the seed openings of its own seeds 
to fail. This could be achieved by either (i) distributing inconsistent shares in 
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step 3 of comp or (ii) sending an incorrect opening in step 3 of open. Cheating 
strategy (i) is easily detected by the verification algorithm of the PVSS scheme, 
which anyone can verify. Furthermore, the proofs of correct decryption ensure 
that the adversary cannot announce a wrong share and the honest parties will 
always obtain the correct seed openings. Cheating strategy (ii) is noticed when 
any of the seed openings fail. In this case, the adversary has already published a 
signature on the commitment and on the opening which means anyone can see 
that the opening fails and the adversary must have sent this. 

Furthermore, an adversary can attempt to cheat by deviating from the pro- 
tocol description in any of the protocol executions. Since the protocol is run 
without private inputs, deviating means sending a message that is inconsistent 
with the protocol description and the committed randomness. If all of the seed 
openings succeeded, the parties can detect this when simulating the protocol 
executions later on. Since everyone knows the commitment and the opening, 
everyone knows the randomness that should have been used. Furthermore, the 
commitments to the seeds have been signed and thus an adversary cannot deny 
that he has sent an inconsistent message. A formal proof of this theorem can be 
found in the full version of this paper [1]. 


5 Computation and Communication Complexity 


In this section, we analyze the computation and communication complexity of 
our compiler. For concreteness, we assume that the PVSS used for our compiler 
is the scheme presented by Schoenmakers [18], but stress that our compiler will 
work with any PVSS satisfying Definition 3. As our compiler simply executes the 
passively secure protocol k times while signing the messages, the computational 
complexity of the protocol execution phase is roughly k times the passively 
secure protocol. Note that the k executions are independent of each other and 
can therefore fully be executed in parallel, preserving the round complexity of 
the passively secure protocol. In terms of communication, each party needs to 
be able to see each message sent during the protocol execution. Therefore, the 
communication complexity of the compiler increases with a factor of n— 1. Note 
that this is inherent to all currently known constructions for compilers in our 
setting [10, 12,19]. 

The main difference in terms of complexity between our work and previous 
works lies in the execution opening and verification phase, where the goal is to 
open k — 1 executions while preventing a detection-dependent abort. The total 
number of exponentiations required to distribute the seed openings of k execu- 
tions with n parties and verify the distributed seeds of all the parties is given 
in Table 1. Furthermore, the number of exponentiations required for decryption 
and reconstruction in case of an aborting adversary is given as well. Here, m is 
the amount of missing messages in total while e is the amount of missing seeds 
of a single party. The total number of group elements communicated via Fpp 
in the execution opening phase of our protocol is given in Table 2. In an honest 
execution, every party uses the PVSS to distribute its seeds and then simply 
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opens its seeds. If a party refuses to do this, for m distinct seeds missing, the 
parties need to publish their decrypted shares together with a proof of correct 
decryption. 


5.1 Comparison with Prior Work 


In contrast to our approach, the deterrence rate e€ of [10] is inversely propor- 
tional to the number of parties n. For this reason, we focus on comparing our 
construction with the TLP approach of Faust et al. [12]. More specifically, we 
focus on comparing the execution opening and verification phase. In our case, 
this is realized by Hopen and possibly reconstruct While the work of [12] uses a 
maliciously secure TLP generation functionality for this. 


Table 1. Computation complexity as Table 2. Communication complexity as 


number of modular exponentiations. number of field elements communicated 
per party. 
Step Comp. Complexity 
Distribution nnti -k Step Comm. Complexity 
Verification (ce +4n)-(kn—k) Distribution k- (F +2n +1) 
‘Decryption o |3em E Opening 2k N 
Reconstruction | (4:n + 3) -e Reconstruction |2 -m 


Note that their puzzle generation does not include the solving of a TLP. The 
puzzle generation always has to be executed but the parties only need to solve 
a TLP in case of an abort. They present an estimation for the total number of 
AND gates for the circuit of this puzzle generation functionality. This circuit 
has a linear complexity in the number of parties, while our seed distribution 
introduces a cubic computational complexity. However, the complexity of their 
functionality is dependent on the length of the RSA modulus N in the terms: 
192|N]|3 + 112|N|? + 22|N|. To illustrate the effects of both complexities, we 
present a concrete example. Take an honest execution of the protocol with n = 5, 
t= 2, k = 2 and thus € = E, With a security parameter of 128 bits, our approach 
costs approximately 108 bit operations while the circuit of [12] requires in the 
order of 101? AND gates to be maliciously evaluated for an RSA modulus of 
2048 bits. 

In terms of communication complexity, our solution is linearly dependent 
on the number of parties and in the above scenario, the opening phase would 
require around 31 group elements to be communicated via Fpp. Assuming Fpp is 
naively implemented using an echo-broadcast protocol, this would require each 
party to send (n—1)?+3n+3 messages per group element. In the above example, 
this would mean each party has to communicate around 8000 bytes with 64-bit 
messages. Instantiating [12] with the actively secure protocol of Yang et al. [22] 
requires 193 bytes per party per multiplication triple. This would thus require 
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in the order of 10/4 bytes to be communicated. Altogether, we expect our con- 
struction to outperform the earlier works in practical scenarios. 
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Abstract. Consensus protocols enable n parties, each holding some 
input string, to agree on a common output even in the presence of cor- 
rupted parties. Recent work has pushed to understand the problem when 
a majority of parties may be corrupted thus providing higher resilience, 
and under various forms of corruptions. Zikas, Hauser, and Maurer intro- 
duced a model in which receive-corrupt parties may not receive messages 
sent to them, and send-corrupt parties may have their sent messages 
dropped. Otherwise, receive-corrupt and send-corrupt parties behave 
honestly and their inputs and outputs are constrained by the security 
definitions. Zikas, Hauser, and Maurer gave a perfectly secure, linear- 
round protocol for n > trev +tsnd + 3tbyz, where trev, tsnd, and tbyz represent 
thresholds on receive-, send-, and byzantine-corruptions. 

We present the first expected constant-round protocol in the general 
corruption model tolerating n > trev + 2tsnd + 2tbyz. In comparison, all cur- 
rent sublinear round consensus protocols fail if there exists even a single 
party which cannot communicate with some honest parties, but whose 
output must be consistent with the honest parties. While presenting 
our protocol, we explore the pathology of send-corruptions and charac- 
terize the difficulty of dealing with them in sublinear-round protocols. 
As an illustrative and surprising example (even though not in sublin- 
ear rounds), we show that the classical Dolev-Strong broadcast protocol 
degrades from tolerating tbyz < n corruptions in the byzantine-only model 
to thyz < n/2—tsnd when send-corrupt parties’ outputs must be consistent 
with honest parties; we also show why other recent dishonest-majority 
broadcast protocols degrade similarly. 

We prove that our new consensus protocol achieves an optimal thresh- 
old of n > trov +tsnd +2tbyz when we constrain the adversary to either drop 
all or none of a sender’s messages in a round (we denote this model by 
spotty send corruptions). To our knowledge, our protocol for the spotty 
send corruption model is thus the first sublinear-round consensus pro- 
tocol for a majority of online faulty parties in any model. Because we 
are unable to prove optimality of our protocol’s corruption budget in the 
general case, we leave open the question of optimal corruption tolerance 
for both send-corruptions and byzantine-corruptions. 


Keywords: Consensus - byzantine agreement - constant rounds - 
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1 Introduction 


Consensus protocols, also known as byzantine agreement protocols, enable n 
parties, each holding some input value, to agree on common outputs even in 
the presence of byzantine corrupted parties. However, the byzantine model often 
does not reflect the real world; in practice, crashing a party, or even forcing 
inconsistent uplink or downlink behavior, is much easier than corrupting it. 

A line of work has explored mixed models in which both crash faults and 
byzantine faults are permitted. Garay and Perry [11] and Altmann, Fitzi, and 
Maurer [4] show that byzantine agreement is possible if and only if n > tera+3tbyz- 
In the asynchronous model, Backes and Cachin [5] showed that broadcast within 
the mixed model is possible if and only if n > 2tca+3tbyz. Kursawe [15] developed 
a consensus protocol for the same bound assuming a public key infrastructure 
(PKI). Recently, Wan et al. [21,22] showed round efficient broadcast protocols 
for dishonest majorities. Zikas, Hauser and Maurer [23] gave a protocol in the 
error-free synchronous model for n > trey + tsnd + 3tbyz, where trey bounds the 
number of receive corruptions, ts,q bounds the number of send corruptions, and 
tbyz bounds the number of byzantine corruptions. 


Faulty Parties With Consistent Outputs. Zikas, Hauser and Maurer introduced 
parties which may be faulty but the faulty processors’ outputs must be consistent 
with honest parties’ outputs because they otherwise behave honestly. In all other 
corruption models, the output of any faulty party need not be considered by 
the definition. We show that the duality of a send-corrupt party whose outputs 
must nonetheless be consistent with honest parties introduces new challenges 
for achieving consensus in sublinear rounds. It is currently not known how to 
push corruption tolerance for sublinear-round broadcast to the dishonest majority 
setting, and for sublinear-round consensus we do not know how to do better than 
treating a send-corrupt party as fully byzantine. However, treating the otherwise- 
honest party in this way also forfeits any guarantees on its output. 


1.1 Send and Receive Corruptions: Honest-but-Faulty 


Send-corrupt parties participate in a protocol as honest parties do, but an adver- 
sary has the power to determine which messages sent by a send-corrupt party 
are delivered and which are not. Nevertheless, they still listen to the protocol 
and their outputs must be consistent with the honest parties’ outputs. 
Receive-corrupt parties may cease to receive messages, but the messages they 
send are delivered. A receive-corrupted party may detect that it is receive- 
corrupted if it does not receive messages that it is expecting. If a receive- 
corrupted party detects that it is corrupted, then — as in [23] — the party enters 
a zombie state. A zombie party stops sending and receiving messages, and out- 
puts L, becoming the functional equivalent to a crashed party in the common 
literature. If a receive-corrupted party has not detected that it is corrupt then 
it may continue to participate, and we require that its output agrees with the 
honest parties’ outputs, even though it may not receive all protocol messages. 
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Zombies and Live Parties. Because send-corrupted and receive-corrupted 
parties may still continue to participate without intentionally deviating from 
the protocol, our definitions require that their outputs (if they produce outputs) 
are consistent with those of the honest parties. We call all honest, send-corrupt, 
and non-zombie receive-corrupt parties live parties; this denotes that the party 
continues to (try to) participate as if it were an honest party. 

We use the convention that whenever a party becomes a zombie, it sends a 
special message (zombie) to all other parties. Upon receiving such a message, a 
party deducts one from its count of n the number of parties, as well as deducts 
one from its threshold for the number of receive-corrupted parties. Note that 
send-corrupt parties may fail to send their zombie declarations, and receive- 
corrupt parties may fail to receive other parties’ declarations. 


1.2 The Pathology of Send Corruptions 
We consider two forms of send corruptions, one more pathological than the other. 


1. Standard send corruption: In the general case (denoted as simply a send 
corruption) as in [23], the adversary may adaptively drop any of a send- 
corrupt party’s outgoing messages in any round. 

2. Spotty send corruption: In a weaker case, an adversary adaptively drops either 
all or none of a send-corrupt party’s outgoing messages in a round. 


Pathology of a (Standard) Send Corruption. Our standard model of a send cor- 
ruption permits the adversary to selectively drop messages by send-corrupting 
a party. Because this behavior is a subset of a byzantine corruption, one would 
expect that corruption bounds follow directly from the byzantine case. We show 
that this is not the case in general. In our model, a send-corrupt party may 
receive a message that would change its output and fail to inform any honest 
party about the message. 

As an illustrative example (embodying a common technique), the Dolev- 
Strong broadcast protocol requires that if some honest party — whose output 
is constrained by definition — receives a message, then all other honest parties 
will receive that message before the protocol terminates. But as we show in 
Sect. 3.1, Dolev-Strong breaks down in our model because a send-corrupt party 
may receive a message that would change its output but fail to forward it. 

In the extreme case, divide an execution into sets such that S contains all 
send-corrupt parties and H contains all honest parties, and let |S| > |H|. Then 
it may be the case that a majority of parties cannot communicate with the honest 
parties, but all of their outputs must be consistent. For this reason, it appears 
very difficult to tolerate more send-corrupt parties than honest parties. Any 
such construction must ensure that sufficiently many parties are “aware” of a 
message to allow it to influence the output. Specifically, we do not know how to 
generate and use information that an honest party has not received a message 
sent by asend-corrupt party. On the other hand, impossibility proofs that depend 
on partitioning techniques also fail in this model because it is impossible to 
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completely separate the send-corrupt group from the honest group, since send- 
corrupt parties always receive all of the honest parties’ messages. 


Pathology of a Spotty Send Corruption. We argue that although our “spotty” 
send corruption is limited in some ways, it is still rich enough to force pop- 
ular techniques for synchronous consensus to fail. In particular, it is unclear 
how to construct a protocol that employs leader election in order to reach con- 
stant expected round complexity in our model. Specifically, a strongly rushing 
adversary as described above can wait for a leader to be elected — and even to 
send messages that attest to its election (e.g., based on a VRF, as in [12,19]) — 
then spotty-corrupt the party, and force it to fail as leader for the duration of 
its tenure. While in the purely byzantine model this attack can be mitigated by 
using threshold signatures (see, e.g., [2,16]), this approach completely fails in our 
model, as electing a leader would most likely elect one of the tsna send-corrupt 
parties (since tsnq can be much larger than the number of honest parties). For 
this reason, recent protocols for dishonest majority broadcast that rely on the 
player-replaceable paradigm, such as [6] and [1] fail in our model. 


1.3 Contributions 


We provide the first systematic treatment of the pathology of send-corruptions, 
and show that considering send-corrupt parties as “nearly” honest in the defi- 
nition either completely breaks or substantially deteriorates the corruption toler- 
ance of both classical and recent broadcast protocols. 

We then provide an expected constant-round byzantine agreement protocol 
that is secure in the strongly adaptive setting against ts,q send-corruptions, trey 
receive corruptions, and tpyz byzantine corruptions where trey + 2tsnd + 2tbyz < n. 
Our protocol builds consensus from graded consensus and a common coin [8, 14], 
with subtle adaptations for our corruption model, with a parallelization of the 
implementation of FixReceive from [23]. When send-corruptions are spotty, we 
show our protocol achieves optimal corruption tolerance of trey + tsna + 2tbyz < n. 

To our knowledge, our protocol for the spotty send model is the first sublinear- 
round consensus protocol for a majority of online faulty parties in any model. 


1.4 Comparison with Related Work and Obvious Solutions 


Recent Advances in Dishonest-Majority Broadcast. One might expect 
that because dishonest-majority broadcast protocols tolerate n > tbyz cor- 
ruptions, they are sufficient for building a consensus protocol tolerating n > 
tsnd + 2tpyz corruptions via folklore reductions (which we discuss in detail in 
the full version), which would achieve better corruption tolerance than our con- 
struction in Sect. 4. We show that this is not true and that recent advances in 
dishonest-majority for adaptive adversaries by Wan et al. [21,22] also fail in our 
model. 

The work of Wan et al. [22] provides an expected constant-round protocol 
for dishonest majority broadcast under a weakly adaptive adversary. However, 


688 K. Eldefrawy et al. 


their “Trust Graphs” assume that only byzantine parties do not send messages, 
and any party that fails to send a message can be excluded. This fails in our 
model because send-corrupt parties must be consistent with honest parties. 

Another recent work [21] uses time-lock puzzles to provide a round-efficient 
broadcast protocol in the presence of dishonest majority and a strongly adaptive 
adversary. However, the approach also fails because honest parties may never 
learn the puzzles sent by send-corrupt parties. It is possible to construct an 
execution in which honest parties solve a set of puzzles T, and the send-corrupt 
parties solve another set of time-lock puzzles T’ = T U S, where S are puzzles 
that are never distributed to the honest parties. However, our definitions require 
that send-corrupt parties’ outputs match those of the honest parties. 


Adapting ZHM [23] to an Expected Constant-Round Protocol. A natu- 
ral attempt to achieve sub-linear round consensus tolerating n > trey +tsnd + dtbyz 
is to adapt the protocol by Zikas, Hauser and Maurer (ZHM) [23] to an expected 
constant-round protocol using the standard construction [8,14] via graded con- 
sensus and a common coin protocol. The ZHM protocol depends on the phase- 
king paradigm [10]; it must run long enough to guarantee that the king is hon- 
est in at least one round. To achieve expected constant-rounds, phase king is 
replaced with a common coin; however, all common coin constructions that we 
know require some threshold scheme. Threshold schemes work in our model when 
n—trey > 2(tsnd + thyz), meaning there are more honest parties than send-corrupt 
or byzantine parties. In the dishonest majority setting where send-corrupt plus 
byzantine parties outnumber honest parties, the construction suffers from the 
partitioning attack described above: a group of send-corrupt parties reach the 
threshold independently of and without knowledge of honest parties, and honest 
parties therefore output a different coin than send-corrupt parties. The ZHM 
construction and corruption bound therefore fail in sublinear rounds. 


Expected Constant-Round Consensus Protocols. There are a number 
of expected constant-round consensus protocols for the honest-majority setting 
that consider only byzantine faults. Feldman and Micali [8] gave an expected 
constant-round scheme for n > 3tpyz. Katz and Koo [14] later gave a protocol 
tolerating n > 2tpyz, assuming a PKI and signatures. Micali [18] gave another 
simple protocol assuming n > 3tpyz- Abraham et al. [2] gave the most efficient 
scheme and tolerate a strongly rushing, adaptive adversary for n > 2tpyz. 


Mixed Corruption Models. In Table 1 we overview the results most relevant 
to our work: consensus protocols in mixed corruption models. We include a 
construction by modifying Dolev-Strong broadcast (Sect. 3.1) via the reduction 
of consensus to broadcast. 

To our knoweldge, our “spotty” send-corrupt protocol exceeds the corrup- 
tion bounds of all comparable models with “exotic” corruptions, who always 
require that a majority of online nodes are honest. For example, recent work has 
generalized crash corruptions into “sleepy” [20] or “sluggish” [13] faults. In the 
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Table 1. Comparison with relevant consensus protocols in mixed corruption models. 
R indicates the round complexity R is given in expectation; otherwise worst-case round 
complexity is always the round complexity. DS denotes Dolev-Strong. 


Protocol Faults # Rounds 
Modified DS Send & Byz: 2tsng + 2tbyz < N O(n 
(Sect. 3.1) 
GP [11] Crash & Byz: tera + 3tbyz < N O(n 
ZHM [23] Receive, Send, Byz: tsna + trov + 3tbyz < n O(n 
This paper | Receive, Spotty Send, Byz: trev + tsnd + 2tbyz < n O(1 
This paper Receive, Send, Byz: trey + 2tsnd + 2tbyz <n O(1 


sluggish model [13], a (mobile) sluggish party can be temporarily disconnected 
from honest parties due to network partition, but can later rejoin. While discon- 
nected, messages sent by or to a party are delayed until the party is reconnected. 
However, in that work it is (inherently) required that at least half of the parties 
are not sluggish and participate in the protocol at all times, and the adversary is 
static. This is a sharp contrast to our model, which allows a majority of dishon- 
est parties and an adaptive adversary. Abraham et al. [3], also in the sluggish 
model, require a majority of online parties to be honest at all times. 

In the “sleepy” model [20], the adversary can make parties “fall asleep” and 
later wake them up (i.e., temporarily crash them) at which point all messages 
that they missed are delivered at once, potentially along with adversarially- 
inserted messages. In their model, a protocol requires only that a majority of the 
awake parties are honest, which closely resembles our result. However, there are 
no send-or-receive-corruptions, meaning all awake parties are full participants 
in the protocol, so their sends always succeed and no incoming messages are 
dropped; this avoids the difficulties studied in this paper. 

Malkhi et al. [17] consider yet another mixed model of corruption, but require 
that a majority of online players behave honestly. The protocol of Garay and 
Perry [11] runs in O(n) round complexity, but only works when n > 3tbyz + tcra- 


1.5 Paper Outline 


The rest of the paper is organized as follows: Section 2 covers preliminaries and 
definitions required in the rest of this paper. Section 3 discusses the pathology 
of send corruptions by illustrating how common paradigms for broadcast fail 
for send-corrupt parties. Section 4 introduces our new expected constant-round 
consensus protocol for send and receive corruptions. 

In the full version, we include the following appendices. In Appendix A, we 
provide the proofs of the protocols in Sect.4. In Appendix B, we show that 
the construction in Sect. 4 has improved corruption tolerance in the spotty send 
model, and prove its optimality. In Appendix C we recall the classical protocol by 
Dolev and Strong for authenticated broadcast. In Appendix D we give a (folklore) 
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construction of consensus from broadcast that completes the reduction of optimal 
fault tolerance for consensus in the presence of (general) send corruptions to 
optimal fault tolerance of broadcast in the presence of send corruptions. 


2 Model and Definitions 


We consider a set of n parties P = {p1,..., Pn} who may send and receive 
messages over a network. A protocol specifies the messages that parties send to 
each other, how they change their internal states, and how they produce their 
outputs. An execution of a protocol proceeds in a series of time steps, in which 
in each step each party first receives messages and then sends messages. We 
assume that all parties start an execution at the same time and have internal 
clocks that advance at the same rate. 


Network. We assume that the network is managed by an adversary that is 
constrained by synchronization requirements. Parties are connected via peer-to- 
peer authenticated channels. We assume a synchronous network; this means that 
any message sent at time t must be delivered to its intended recipient at time 
t+ 1 (unless message delivery is attacked by the adversary, as described below). 


Corruptions. The adversary may adaptively corrupt parties that participate 
in an execution. We allow an adversary to corrupt a party in one of three modes, 
which we describe in the following. A party that is not corrupted must follow 
the protocol specification and is called honest. Once a party is corrupted, it may 
not become honest again. 

A receive corruption allows the adversary to selectively drop messages sent 
to the party. A send corruption allows the adversary to selectively drop mes- 
sages sent by the party. A byzantine corruption allows the adversary to control 
all messages sent by the party and view its internal state. We categorize send 
corruptions in two types: 


1. A (standard) send corruption allows the adversary to adaptively drop arbi- 
trary messages sent by the party without constraint. 

2. A spotty corruption allows the adversary to adaptively drop all messages 
sent by a spotty party p at some time by issuing an instruction (drop, p) to 
the network. Specifically, the drop instruction is constrained such that all 
messages must be dropped if any message is dropped. Because all messages 
sent by a party at some time must either be delivered or dropped, we say 
that the adversary must uniformly drop or deliver messages for the party. 


Recall that a party that detects it is corrupted declares itself a zombie; it does 
this by outputting zombie. We call a party live if it is not byzantine-corrupted 
and has not declared itself a zombie — all honest parties and send corrupt parties 
are live, and receive corrupt parties that have not become zombies are live. We 
note that the adversary does not need to corrupt both a sender and a receiver in 
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order to drop a message between them; it suffices for the adversary to corrupt 
only one of them. We also do not require that the sets of send-corrupt and receive- 
corrupt parties are disjoint. However, any party that is both send-corrupt and 
receive-corrupt is counted toward both thresholds. 


Strongly Rushing Adversary. We consider an adversary that is strongly 
rushing, similar to that of [1,2], but we extend it to drop messages from send- 
corrupt parties. In our model, a strongly rushing adversary is permitted to read 
messages that are sent by an honest party over the network and then choose to 
corrupt the party in the same time step. If the adversary chooses to send-corrupt 
the party, then it can drop messages sent by the party in that step; similarly, 
if the adversary chooses to receive-corrupt the party, then it can drop messages 
sent to the party in that step. In either case, the party is send- or receive- 
corrupted from that step forth. If the adversary chooses to byzantine corrupt 
a party in some step, it removes all messages sent by the party at that time 
step. The adversary then chooses what messages the party sends in that step, 
and to which parties it sends what messages. The corrupted party is byzantine 
corrupted from that time forth. 


2.1 Digital Signatures and Coin Flipping 


Our constructions require the use of a digital signature scheme. In particular, 
we assume that parties have access to a public key infrastructure (PKI) for a 
digital signature scheme, meaning each party is aware of a set of public keys 
{pk,,...,pk,,}, where pk, is associated with p; for i € [n]. We consider that 
all parties choose their own public and private keys; in particular, some par- 
ties may adversarially choose their key pairs. Our constructions will assume an 
idealized signature scheme for which signatures are perfectly unforgeable; with 
signature schemes that achieve unforgeability against computationally bounded 
adversaries, our protocols achieve security except with negligible probability. 

Our construction requires the use of an unbiasable coin flipping protocol 
II". We assume idealized access to such a primitive, as if implemented by an 
ideal functionality that takes no input (or more formally, takes as input the 
empty string) and delivers a uniformly random bit to all parties. Such a coin 
flipping protocol may be instantiated (assuming a trusted setup) by augmenting 
threshold signatures [16] (using threshold ty, + 1, see below) with a protocol 
for reliable sends in our model, such as FixReceive ((23], or ours below). At a 
high level, we require that: (A) Until at least one live party queries J7°°" in the 
r-th invocation, the output for that invocation is uniformly distributed for the 
adversary. (B) All live parties output the same value in °°", 


2.2 Defining Broadcast and Consensus 


We provide new definitions for the considered mixed model by adapting the 
standard definitions of consensus protocols and constraining the behavior of all 
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live parties, including send-corrupt parties and receive-corrupt parties that have 
not become zombies. Note that our definitions quantify over all the inputs of 
live parties that participate starting at the beginning of an execution, and over 
the outputs of only parties that are not zombies by the end of the execution. 

Towards the definitions, we introduce thresholds on the number of corrup- 
tions that we permit the adversary to make per execution. We use tend, trev, 
and tpyz to denote thresholds on the number of send, receive, and byzantine cor- 
ruptions, respectively, in an execution. We introduce the following definition of 
an execution in which some parties may be corrupted in order to facilitate the 
definitions of our consensus problems. 


Definition 1 ((tsnd, trov; tbyz)-Compliant Execution). For a protocol II, we 
say that an execution of II is (tsna,trcv, toyz) compliant if at most tena, trev, 
and tyz parties are send-corrupted, receive-corrupted, and byzantine-corrupted, 
respectively, in the execution. 


Broadcast. In a broadcast protocol, a dealer D € P wishes to send a message 
m € {0,1}* to the parties in P. Each party p € P outputs a message m’ € 
{0, 1}* U{L}, subject to the following constraints: 


Definition 2 (Broadcast). Let IT be a protocol for parties P = {pi,...,pn} in 
which a distinguished party D € P holds an input m € {0,1}*. II is a Broadcast 
protocol if the following properties hold except with negligible probability. 


1. (tsnd; trev, thyz)- Validity: IT is (tsnd, trcv, thyz)-valid if in every (tsna, trev, thyz)- 
compliant execution in which D is honest or receive corrupt (but not send- 
corrupt), every live party outputs m. 

2. (tsnd, trov; tbyz)-Consistency: IT is (tsnd,trev, tbyz)-consistent if in every 
(tend, trev; tbyz)-compliant execution in which any live party outputs m’ € 
{0,1}* U {L}, every live party outputs m’. 

3. (tend, trov; toyz) - Termination: II is (tsna, trov, tbyz)-terminating if in every 
(tend trev; thyz)-compliant execution, every live party outputs some m € 
{0,1}* U{L} and terminates within finitely many steps. 


If II is (tsnd, trev, thyz)-valid, (tsnd, trev, tbyz)-consistent, and (tsna, trev, thyz)- 
terminating then we call it (tsna, trev, thyz) -secure. 


Consensus. In a (binary) consensus protocol, each party has an input b € 
{0,1}. Each party is expected to output a bit v € {0,1}. 


Definition 3 (Consensus). Let I be protocol for parties P = {p1,..., Pn} 
in which each party has an input b € {0,1}. IT is a Consensus protocol if the 
following properties hold except with negligible probability. 


1. (tsnds trov, toyz) - Validity: IT is (tsnd, trev, tbyz )-valid if in every (tend, trev, tbyz)- 
compliant execution in which all live parties have the same input b € {0,1}, 
all honest parties output b. 
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2. (tsnd, trov, tbyz)- Consistency: II is (tsnd,trev, tbyz)-consistent if in every 
(tend, trev; tbyz)-compliant execution in which any live party outputs v, every 
live party outputs v. 

3. (tsnd, trov; thyz)- Termination: II is (tsnd, trov, tbyz)-terminating if in every 
(tend, trev; tbyz)-compliant execution, every live party outputs v € {0,1} and 
terminates within finitely many steps. 


If II is (tsnd, trev, thyz)-valid, (tsnd, trev, thyz)-consistent, and (tsna, trev, thyz)- 
terminating then we call it (tsna, trev; thyz)-Secure. 


3 On the Difficulty of Optimal Corruption Tolerance 
for Send-Corrupt Parties 


In this section we discuss the pathology of “standard” send corruptions with 
respect to current techniques in the literature, and describe why send corruptions 
appear as deleterious as full byzantine corruptions. Although our focus is on 
consensus protocols, we consider techniques for both consensus and broadcast; 
the two are related by a (folklore) reduction, which we discuss in the full version. 

We remark that there is evidence for the difficulty of send corruptions in the 
classical literature. The impossibility proof by Dolev and Strong [7] that any 
deterministic broadcast protocol requires at least thy, + 1 rounds (for at most 
tbyz byzantine corruptions) requires only dropping messages sent by parties that 
otherwise act honestly. It follows immediately that any deterministic broadcast 
protocol requires at least tsnq+ 1 rounds (or more generally, at least tsnd + toyz + 1 
rounds). Moreover, there has been recent work by Chan, Pass, and Shi [6] to 
extend the lowerbound by Dolev and Strong to randomized protocols. Because 
their adaptation also requires only dropping sent messages, their lowerbound 
also directly transfers to the send-corrupt model. 

In Sect. 3.1, we show that the Dolev-Strong broadcast protocol fails as written 
when considering send corruptions. We modify the protocol and show that with- 
out new ideas, its corruption threshold degrades from n > tbyz (in the original 
model) to n > 2(tsnd + thyz). In Sect. 3.2, we visit recent techniques for security 
against strongly rushing, adaptive adversaries and show that these also fail to 
yield a corruption threshold better than n > 2tsna +2tpyz (which our construction 
in Sect. 4 achieves) when requiring send-corrupt parties’ outputs to be consistent 
with honest parties’ outputs. 


3.1 Modifying Dolev-Strong Broadcast 


As an example of the pathology of send-corruptions, we now recall the classical 
authenticated broadcast protocol by Dolev and Strong [7]. Because the protocol 
is canon, we defer the original to the full version but review it here. 

The protocol uses a data structure that we will call a sig-chain. A 1-sig-chain 
is a pair (m,o), where ø is a signature on string m. For i > 1, an é-sig-chain 
is a pair (m,o), where m is an (i — 1)-sig-chain and ø is a signature on m. A 
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valid i-sig-chain is a sig-chain with the property that no two signatures in the 
sig-chain are computed using the same key. An i-sig-chain contains a message 
m’ if m’ is the message of the 1-sig-chain on which the sig-chain is built. 

The protocol operates as follows: In the first round, the dealer creates a 1- 
sig-chain containing its input and sends the sig-chain to all parties. In every 
subsequent round 7, any party that received a valid i — 1 chain in the previous 
round that did not contain a signature that it had computed creates an i — 1 
sig-chain by appending its own signature to the chain. It then sends the 7-sig- 
chain to all parties. In any round 2, if a party receives a valid 7-sig-chain, then 
it adds the message m contained in the sig-chain to a set of candidate outputs. 
If the set of candidate outputs contains only one candidate at the end, then the 
party outputs that message. Otherwise it outputs L. 


Where Dolev-Strong Fails. In the send-corruption model, the Dolev-Strong pro- 
tocol fails because it is possible for send-corrupt parties to output some message 
m while honest parties output L. Consider an execution in which the parties 
are partitioned into three sets: H contains all of the honest parties, S contains 
all send-corrupt parties, and B contains all byzantine parties. Let the dealer be 
send-corrupt. It is possible that in this execution, the send-corrupt parties com- 
municate only with parties in S U B. Then send-corrupt and byzantine parties 
can collectively build a tpyz + 1-chain containing m and no honest parties ever 
receives the dealer’s message or any sig-chain containing the message. But this 
violates consistency. 


Modifications. In order to resolve this problem, we must make two modifications 
to the protocol. First, a party must receive an tsnd + tbyz + 1-sig-chain for any 
message that it will output; no chain of less than ten + tbyz + 1 length may add 
a message to the set of candidate outputs. (This additionally requires that the 
protocol is run for tsnd+tbyz +1 rounds.) Second, we update the bounds to require 
that n > 2tsnd + 2tbyz- A majority of honest parties is necessary to ensure that 
honest parties can always build a tsna + tbyz + 1-sig-chain without the assistance 
of byzantine or send-corrupt parties, which is necessary for validity. 
We present our modified Dolev-Strong protocol J7™?S in Fig. 1. 


Theorem 1. [7 95 is a (tsnd, thyz)-secure broadcast protocol for n > 2tsna + 
2tpyz. 


Proof. The proofis similar to the original by Dolev and Strong, subject to modifi- 
cations described above. Validity follows from the fact that when n > 2tsna + 2tbyz 
and the dealer is honest, the honest parties build a (tsnd + tbyz + 1) sig-chain, and 
that no sig-chain can exist containing some m’ that the dealer did not send. Con- 
sistency follows from the fact that if a (tsnq +tbyz +1) sig-chain exists, then some 
honest party’s signature must be included. It follows that if any honest party 
output m, then all honest parties receive a (tsng + tbyz + 1) sig-chain containing 
m. Assume that some honest party receives a (tsnd+tbyz + 1) sig-chain containing 
m and another honest party receives a (tsnq + tbyz + 1) sig-chain containing m’. 
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Protocol 1 Modified Dolev Strong Broadcast Protocol 7™°™> 

Shared Setup: Public Key Infrastructure (PKI) for a signature scheme. 

Inputs: The dealer D € P has an input m € {0,1}*. 

Outputs: Each party p € P outputs a value m’ € {0,1}* U {L}. 

Local Variable: Each party p € P maintains a local variable S, which is a set initialized 
to {}. 

Protocol: The protocol begins at time 0 and proceeds in rounds. Each round party p 
proceeds as follows: 


1. Round 1: Dealer’s Messages The Dealer D signs its input o < sign,,(m) and 
sends (m,c) to all parties. 

2. Sig Chains: For every round i from 2 to tsnd + tbyz + 1: For every valid (i — 1)-sig- 
chain c that p received at the end of round i — 1 in which none of the signatures 
were constructed by p, p computes o + sign,,(c) and sends (a,c) to all parties. 

3. Output: For every valid (tsnd + tbyz + 1)-sig-chain c that p received at the end of 
round tsnd +tbyz +1, let m’ be the message contained by c and update S = SU{m’}. 
If |S| = 1, then p outputs the element m’ € S. If |S| # 1, then p outputs L. 


Fig. 1. Modified Dolev-Strong Broadcast Protocol [4S 


Then both sig-chains must include an honest signature, and therefore there must 
be (tsnd + tbyz + 1) sig-chain containing m and m’ in the view of every honest 
party. It follows that every honest and send-corrupt party outputs L. 


Can Dolev-Strong Be Fixed to Support n > tsnd+tbyz ? We have shown that with- 
out new ideas, Dolev-Strong cannot be updated to tolerate n > tsnd + tbyz (which 
it is easy to prove is an optimal corruption budget). However, we cannot rule out 
such a threshold. In the pathological execution described above, honest parties 
do not send any messages if they do not receive any valid sig-chains. However, 
honest parties may send messages in each round containing L, indicating “I have 
not received a message,” which conveys that the party’s sent message was not 
dropped. This provides more information to the protocol, but we do not know 
how to use such a technique to improve broadcast. 


3.2 Recent Techniques for Adaptive, Strongly Rushing Adveraries 


Recent techniques for byzantine agreement and broadcast against a strongly 
rushing adversary also fail when requiring consistency between send-corrupt par- 
ties’ outputs and honest parties’ outputs. For example, the byzantine agreement 
protocol by Abraham et al. [2] and the broadcast protocol by Wan et al. [21] 
achieve security against a strongly adaptive adversary by effectively committing 
to any leader’s messages early in the protocol, and then revealing a leader in 
a later round. This thwarts strongly rushing adaptive adversaries because by 
the time a leader is elected, it is too late to corrupt the leader and remove the 
messages it has sent. 

In the partitioning attack, send-corrupt parties are able to send messages to 
each other but not to the honest parties, and they are able to reach signature 
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thresholds on messages that no honest party ever receives. For example, in [2], 
messages often require b + 1 distinct signatures (implying at least one honest 
party signed a message) in order to be recognized by an honest party. But when 
there are more send-corrupt parties than honest parties, any threshold number 
of signatures that honest parties must be able to attain on their own must also 
be attainable by send-corrupt parties only. This can cause send-corrupt parties 
to adopt a different leader in some step than the honest parties. Similarly, in 
[21], send-corrupt parties’ puzzles may never be delivered to honest parties. 
When honest parties choose a leader based on the solutions to a set of time-lock 
puzzles, send-corrupt parties may make a decision based on a larger set than the 
honest parties, and their decisions may differ. This form of attack is prevented 
by the implicit echoing assumption in [21], but it does not carry into the send- 
corrupt model. In our model, this attack is thwarted by requiring the number 
of honest parties be greater than 2(tsnd + tbyz), as thresholds on the number of 
signatures can enforce that some honest party signs a message. 


4 Constant-Round Synchronous Consensus 
for n > rcv z 2tsnd F 2tbyz 


We now present a protocol for consensus in synchronous networks in the presence 
of send corruptions, receive corruptions, and byzantine corruptions where digi- 
tal signatures are available. We prove that the protocol is (tsna, trev, thyz)-Secure 
for n > trey + 2tsnd + 2tbyz. In the full version, we show the same protocol is 
(tend, trev, tbyz)-Secure for n > trey + tend + 2tbyz when send corruptions are spotty, 
and that corruption budget is optimal. 

Towards presenting our consensus protocol, we first present protocols for 
weak broadcast, weak consensus, and graded consensus. Each protocol is used 
as a building block in our ultimate consensus protocol. Due to space constraints, 
we defer the proofs of most of our protocols to the full version. Before introducing 
these building blocks, we introduce another protocol for reliable sending when 
all parties send messages to each other. 


4.1 All-to-All FixReceive 


Our All-to-All FixReceive protocol is similar to FixReceive from [23], tuned for 
the common scenario in which all parties attempt to send a message to all other 
parties. The parties forward all unique messages that they receive, in order to 
ensure that every party either receives message that was sent, or detects that it 
is receive-corrupted. The parties output all unique messages that they receive. 
A party detects whether it is receive-corrupt based on the number of messages 
it receives; if so, it becomes a zombie and notifies the other parties. We prove that 
a receive-corrupt party that does not become a zombie must receive a message 
from another honest or send-corrupt party. We then prove that if some honest 
party attempts to send a message m via the protocol, then every non-zombie 
party must receive that message. The proofs are deferred to the full version. 
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Protocol 2 All-To-All FixReceive Protocol TF? (tena, trev, tbyz) 

Inputs: Each party p € P has an input m € {0, 1}*. 

Outputs:Each party p € P outputs some message for every other party in P, or outputs 
zombie. 


Protocol: The protocol proceeds in two rounds, in which every party sends its input m 
to every other party, and then parties forward the unique messages they have received, 
as follows: 


1. Send Messages: Each party sends its signed input m to every other party. 

2. Replay: Every party forwards every unique message that it received in Round 1 
to every other party. If a party did not receive any unique messages in Round 1, 
it sends L to every other party. 

3. Output: If a party p does not receive more than n — tsnd — tbyz messages (including 
L) in either round, it sends zombie to all parties and outputs zombie. Otherwise, 
p outputs the set of unique messages that it received in Round 2. 


Fig. 2. All-to-all FixReceive Protocol 77” 


Lemma 1 (Zombies in JJ"). Any party p becomes a zombie during IT’ ® 
only when it is receive-corrupt. If p does not become a zombie then it received a 
message from at least one honest or send-corrupt party. 


Lemma 2 (Honest and Receive-Corrupt Send to All). If an honest party 
or receive-corrupt party (but not send-corrupt) sends a message m using II*®, 
then every live party receives m or becomes a zombie. 


4.2 Weak Broadcast 


Our first building block is a weak broadcast primitive. In a weak broadcast 
protocol, a dealer D € P wishes to send a message m € {0,1}* to the parties 
in P. Each party p € P outputs a message m’ € {0,1}* U {1}, subject to the 
following constraints: 


Definition 4 (Weak Broadcast). Let II be a protocol for parties P = 
{pi,.--,;Pn} and a distinguished party D € P holds an input m € {0,1}*. IH 
is a Weak Broadcast protocol if the following properties hold except with negli- 
gible probability. 


1. (tsnd; trov; toyz) - Validity: IT is (tena, trov, tbyz)-valid if in every (tsnd, trev, teyz)- 
compliant execution in which D is honest or receive corrupt (but not send- 
corrupt), every live party outputs m. 

2. (tend, trov, thyz)-Unanimity: IT is (tsnd,trev,tbyz)-unanimous if in every 
(tend, trev; tbyz)-compliant execution in which D is live, either every live party 
outputs m € {0,1}* or every live party outputs L. 

3. (tend, trov, tbyz)- Consistency: IT is (tsnd,trev,tbyz)-consistent if in every 
(tend trev; tbyz)-compliant execution in which any honest party outputs m’ € 
{0,1}*, every live party outputs m’ or L. 
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Protocol 3 Weak Broadcast Protocol MW 

Shared Setup: Public Key Infrastructure for a signature scheme, every party knows the 
identity of the dealer and its public key pk. 

Inputs: The dealer D € P has an input m € {0,1}*. 

Outputs: Each party pi E€ P outputs a value m’ € {0,1}* U {1}. 

Protocol: The protocol begins at time 0 and proceeds in rounds, in which each round 
lasts for A time. Each round party p proceeds as follows: 


1. Dealer’s Messages: The Dealer D signs its input ø « sign,(m) and sends 
(deal, m,o@) to all parties, where ø is the signature on m using its secret signing 
key sk. 

2. Echo Dealer’s Value: Parties run JI"? based on the messages they received 
from D. If p received a message from D, let (m',a) be the message and signature 
that p received. p inputs (echo, m’,c) to I". Otherwise, p inputs (echo, L, L) to 
TER, 

3. Replay: Parties again run JJ” based on the messages they received in the previ- 
ous round, where each party provides all of the unique messages it received in the 
previous JJ** as input. 

4. Verification and Output: If p did not output any messages signed with D’s key 
from the first run of 7’, then it outputs L. If in the outputs of the second run 
of I”, p receives any two pairs (m/, øi) and (mj,o;) such that m; # mj but 
verpk(oi) = 1 and ver,.(o;) = 1, then p outputs L. Otherwise, p outputs the unique 
message m’ that it received in the first run of II"? whose signature verifies with 
D’s public key. 


Fig. 3. Weak broadcast protocol HWP 


4. (tsnas trov; toyz) - Termination: IT is (tsna,trev; toyz)-terminating if in every 
y y 
(tsnd; trev, toyz -compliant execution, every live party outputs some m € 
{0,1}* U {1} and terminates within finitely many steps. 


If II is (tnd, trev, tbyz)-valid, (tsna, trev, thyz)-consistent, and (tsna, trev, thyz)- 
terminating then we call it (tsnd,trev,tbyz)-secure. If I is additionally 
(tends trev, thyz)-unanimous, then we call it (tend, trev, tbyz)-secure with unanimity. 


Our protocol for weak broadcast is presented in Fig. 3. It follows a standard 
construction, adapted for our corruption model by invoking HFE to distribute 
messages. It permits a designated dealer to send an arbitrary message m to all 
parties, with the guarantee that every party outputs either m or L. 


Lemma 3 (Security of Weak Broadcast IB). Protocol IW (tena, trov, thyz) 
is a (tsnd; trev; tbyz)-secure weak broadcast protocol for n > tend + trev + 2tbyz- 


Proof. The proof is deferred to the full version. 


We provide an additional statement about the outputs of WB when the 
dealer is corrupt but not byzantine. Specifically, consistency holds over the out- 
puts of all live parties when the dealer is send-corrupt (and not only when some 
honest party outputs m Æ L). 
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Lemma 4. When the dealer is send-corrupt, if one live party outputs m Æ L, 
then every live party outputs m’ € {m, L} 


Proof. Follows directly from unforgeability of the idealized signature scheme. 


4.3 Weak Consensus 


We use weak consensus as a stepping stone to achieve consensus. In a weak 
consensus protocol, all honest parties have an input b € {L,0,1}, and all honest 
parties are expected to output a value v € {1,0,1}, subject to the following: 


Definition 5. (Weak Consensus). Let II be a protocol for parties P = 
{pi,---;Pn} in which every party p E P has an input b € {0,1}. I is a Weak 
Consensus protocol if the following properties hold except with negligible proba- 
bility. 


1. (tend, trev, toyz) - Validity: II is (tsnds trov, tbyz)-valid if in every (tsnd; trev, tbyz)- 
compliant execution in which all honest parties have the same input b and no 
live parties have input 1 — b, all honest parties output b. 

2. (tsnd, trov, tbyz)- Consistency: IT is (tsnd,trev,tbyz)-consistent if in every 
(tsnd; trev; toyz) -compliant execution in which any live party outputs v € {0,1}, 
no live party outputs 1 — v. 

3. (tsnd; trevs toyz) - Termination: IT is (tsna, trov, tbyz)-terminating if in every 
(tend, trev; toyz) -compliant execution, every live party outputs v € {1,0,1} and 
terminates within finitely many steps. 


If II is (tsnd, trev, toyz)-valid, (tsna, trev, thyz)-consistent, and (tsna, trev, tbyz)- 
terminating then we call it (tsna, trev, thyz) -secure. 


We present our weak consensus protocol IJ“© in Fig. 4. The protocol is an 
adaptation of the reduction from Weak Consensus to Weak Broadcast [9], and 
proceeds in two synchronous rounds. First, in parallel, each party signs its pro- 
tocol input and sends its signed input to all parties. Second, upon receiving all 
other parties’ inputs, each party attempts to generate a certificate in favor of 
some output value. A certificate for a bit u is a set of n — tsnd — trev — thyz unique, 
valid signatures on u. If a party is able to generate a certificate, it sends the 
certificate to all other parties. 

A party outputs a bit v only if it meets three conditions: (1) it must generate 
a certificate in the beginning of the second round; (2) it must receive at least 
N — tsnd — trey — thyz valid certificates from distinct parties; (3) it must not receive 
a valid certificate for 1 — v from any other party. Otherwise it outputs L. 

Intuitively, validity of the protocol is guaranteed by the fact that if all live 
parties have input b, then all honest parties will be able to construct a certificate 
for b, and there will not be enough corrupt parties to construct a certificate for 
1 — b. Consistency is guaranteed by the fact that if two live parties are able to 
generate certificates for opposite values, then they must share their certificates 
with each other, and then both output L. 
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Lemma 5. (Security of IWC). Protocol IWS (tsna, trev, thyz) is a (tsnd; trev; thyz)- 
secure Weak Consensus protocolin synchronous networks forn > trey+ tna +2tbyz- 


Proof. The proof is deferred to the full version. 


Protocol 4 Weak Consensus H Ce, toyz) 

Shared Setup: Public Key infrastructure for a signature scheme. 

Inputs: Each party p € P has an input b € {1,0,1} and a secret signing key for the 
signature scheme. 

Outputs: Each party p € P outputs a value v € {1,0,1}. 

Protocol: The protocol begins at time 0 and proceeds in rounds, in which each round 
lasts for A time. Each party p; proceeds as follows: 


1. Sign Inputs: In parallel, each party signs its input bit and sends its signed input 
to all other parties. 

2. Construct Certificates and WB: Each party collects all of the signed input bits 
from the other parties. If there is a v € {0,1} for which n—tsna—trev—tbyz valid signed 
messages are received, p constructs a certificate composed of n — tsnd — trov — toyz 
signatures from distinct parties on v. The parties then invoke n weak broadcasts 
in parallel, in which p; is the dealer in the ith weak broadcast, and p; provides its 
certificate as input if it has one; otherwise p; provides | as its input. 

3. Output: Each party receives any certificates sent to it in Round 2. If p constructed 
a certificate for some v in round 2 AND p has received at least n — tsng — trev — toyz 
certificates for v by the end of round 2 from distinct parties AND p has not received 
a valid certificate for 1 — v, then p outputs v. Otherwise, p outputs L. 


Fig. 4. Weak Consensus Protocol 7° 


4.4 Graded Consensus 


We define an additional weakened form of consensus called graded consensus, 
which was originally introduced by Feldman and Micali [8]. In a graded consensus 
protocol, each party has an input b € {0,1}. Each party is expected to output a 
pair (v,g) € {0,1}", where v is the output bit and g is a grade. 


Definition 6 (0/1 Graded Consensus). Let I be a protocol for parties 
P = {p1,..., Pn} where each party has input b € {1,0,1}. I is a 0/1 Graded 
Consensus protocol if the following properties hold except with negligible proba- 
bility. 


L (tnd, trev; toyz) - Validity: IT is (tsnd; trev, tbyz -valid if in every (tend, trev, tbyz)- 
compliant execution in which all honest parties have the same input b € {0,1} 
and no live parties have input 1 — b, all live parties output (b, 1). 

2. (tsnd, trov; tbyz)- Consistency: II is (tsnd,trev, tbyz)-consistent if in every 
(tend trev; toyz) -compliant execution in which any live party outputs (v,1), 
every live party outputs (v, g) € {0,1}?. 
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Protocol 5 Graded Consensus TE (Bias tev, tbyz) 

Inputs: Each party p € P has an input b € {L,0, 1} 

Outputs: Each party p € P outputs a pair (v, g) € {0,1}? 

Protocol: The protocol begins at time 0 and proceeds in synchronous rounds, labeled 
below, where each round lasts long enough for its corresponding subprotocol to com- 
plete. Each party p proceeds as follows: 


1. Weak Consensus: Run JIS with b as input. Let b’ denote the output of IS. 
2. Weak Broadcast: In parallel, all parties invoke n copies of TE anas tiers tages 
where p; is the dealer in the jth copy. p; uses the value b’ as its input to IIB. For 
u € {1,0,1}, let nu denote the number of weak broadcasts for which p outputs u. 
3. Output: 
— Assign v — u € {0,1} for which nu > nı—u. Break ties by assigning v + 1. 
Assign g — 1 if ny > n — toyz — trey — tsna. Else g — 0. Output (v, g) 


Fig. 5. Graded consensus protocol JSS 


3. (tsnd; trov, tbyz)- Termination: II is (tsna, trov, tbyz)-terminating if in every 
(tsnd; trev, tboyz)-compliant execution, every live party outputs (v, g) € {0,1}? 
and terminates within finitely many steps. 


If II is (tsna,trcv, toyz)-valid, (tsnd, trev, toyz)-consistent, and (tsna, trev, thyz)- 
terminating then we call it (tsnd, trev, thyz) -Secure. 


Our graded consensus protocol JT° is presented in Fig. 5; it is an adaptation 
to our fault model of the reduction of graded consensus to weak broadcast dis- 
cussed by Fitzi [9]. Specifically, JISC proceeds in synchronous rounds in which 
two subprotocols are invoked. First, parties invoke a weak consensus protocol, 
using their protocol inputs as input to the weak consensus protocol. Second, in 
parallel, all parties weak broadcast their outputs from the weak consensus proto- 
col. Parties determine their outputs based on the weak broadcasts they receive. 
First, a party sets the bit v to the value u € {0,1} for which it received more 
weak broadcasts carrying u than 1 — u. Second, a party sets its grade g to 1 if 
it receives than n — tbyz — trev — tsnd weak broadcasts carrying bit v, and sets 
its grade to 0 otherwise. It then outputs (v,g). Intuitively, each party outputs 
a bit v based on the majority of weak broadcasts that it has received. A party 
outputs grade 1 if it has received a large enough majority of weak broadcasts 
carrying v that it is guaranteed no other honest party has received a majority 
of weak broadcasts carrying 1 — v. The proof follows from a quorum argument. 


Lemma 6 (Security of ICC). Protocol MOS pei trnto is a (tsnd, trev; tbyz)- 
secure graded consensus protocol in synchronous networks for n > trey + 2tsnd + 
2tpyz. 


Proof. The proof is deferred to the full version. 
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4.5 Expected Constant-Round Consensus 


In Fig. 6 we present I7*, our expected constant-round protocol for consensus. The 
protocol follows the standard coin-loop paradigm to go from graded consensus 
to byzantine agreement. To ensure termination, the protocol ensures that when 
a party terminates, it holds a certificate that it can send to all parties in order 
to make them terminate with the same value. 


Theorem 2 (Main Theorem). [*(tsna, trev, toyz) is a IT” (tnd, trev, thyz)-secure 
consensus protocol in synchronous networks for n > trey + 2tsnd + 2tbyz, where a 
common coin primitive is available. 


Proof. The proof is deferred to the full version. 


Protocol 6 Expected Constant-Round Protocol I7*(tsna, trev, tbyz) 

Common Setup: The parties have access to a public key infrastructure for some signa- 
ture scheme. 

Inputs: Each party p € P has an input b € {0, 1} 

Outputs: Each party p € P outputs some b’ € {0,1} 

Internal Variable: Each party maintains a variable v € {0,1} which is initialized to b. 
For each u € {0,1}, each party also maintains a set Du of distinct (decide, u) messages 
that it has received. 

Protocol: The protocol begins at time 0 and proceeds in synchronous rounds. Each 
party p proceeds as follows: 


— Loop starting with iteration 7 = 0 until terminating: 

1. Subround A (Graded Consensus): Run TS (tena, trev, tbyz) With v as input. 
Let (u,g) denote p’s output of JSS. 

2. Subround B (Common Coin): Invoke a common coin protocol IT" and 
assign to p; the output. 

3. Conditional Update: If g = 0, then update v — yi. If g = 1, then update 
vu. 

4. Conditional Decision: If g = 1 and v = y: sign (decide, v), send the signed 
message to all parties, and output v. 

5. Certificate Send: All parties invoke JI” ®, where any party that has gen- 
erated or received a certificate since the last invocation of IIF ® provides the 
certificate as input, and terminates after 7?*. Any party that does not have 
a certificate inputs L. 

— Certificate: Upon receiving a signed (decide, u) message from any party, add the 
message to Du. When Du contains at least tpyz + 1 messages from distinct parties, 
construct a certificate of tpyz + 1 (decide, u) messages from distinct parties. Upon 
receiving a certificate, output u (if have not already output). 


Fig. 6. Expected constant-round consensus protocol II* 
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Abstract. Custom currencies (ERC-20) on Ethereum are wildly popu- 
lar, but they are second class to the primary currency Ether. Custom cur- 
rencies are more complex and more expensive to handle than the primary 
currency as their accounting is not natively performed by the underlying 
ledger, but instead in user-defined contract code. Furthermore, and quite 
importantly, transaction fees can only be paid in Ether. In this paper, we 
focus on being able to pay transaction fees in custom currencies. We achieve 
this by way of a mechanism permitting short term liabilities to pay trans- 
action fees in conjunction with offers of custom currencies to compensate 
for those liabilities. This enables block producers to accept custom curren- 
cies in exchange for settling liabilities of transactions that they process. 
We present formal ledger rules to handle liabilities together with the 
concept of babel fees to pay transaction fees in custom currencies. We also 
discuss how clients can determine what fees they have to pay, and we 
present a solution to the knapsack problem variant that block producers 
have to solve in the presence of babel fees to optimise their profits. 


1 Introduction 


Custom currencies, usually following the ERC-20 standard, are one of the most 
popular smart contracts deployed on the Ethereum blockchain. These currencies 
are however second class to the primary currency Ether. Custom tokens are not 
natively traded and accounted for by the Ethereum ledger; instead, part of the 
logic of an ERC-20 contract replicates this transfer and accounting functional- 
ity. The second class nature of custom tokens goes further, though: transaction 
processing and smart contract execution fees can only be paid in Ether—even 
by users who have got custom tokens worth thousands of dollars in their wallets. 

The above two limitations and the disadvantages they introduce seem hard 
to circumvent. After all, it seems unavoidable that custom tokens must be issued 
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by a smart contract and interacting with a smart contract requires fees in the 
primary currency. Still, recent work addressing the first limitation, showed that 
it can be tackled: by introducing native custom tokens (see e.g., [6]) it is possible 
to allow custom tokens to reuse the transfer and accounting logic that is already 
part of the underlying ledger. This is achieved without the need for a global reg- 
istry or similar global structure via the concept of token bundles in combination 
with token policy scripts that control minting and burning of custom tokens. 
Nevertheless, even with native custom tokens, transaction fees still need to be 
paid in the primary currency of the underlying ledger. 

To the best of our knowledge the only known technique to tackle the sec- 
ond limitation is in the context of Ethereum: the Ethereum Gas Station Network 
(GSN).' The GSN attempts to work around this inability to pay fees with custom 
tokens by way of a layer-2 solution, where a network of relay servers accepts fee-less 
meta-transactions off-chain and submits them, with payment, to the Ethereum 
network. In return for this service, the GSN may accept payment in other denomi- 
nations, such as custom tokens. Meta-transactions have the downside that in order 
to remove trust from intermediaries, custom infrastructure in every smart con- 
tract that wants to accept transactions via the GSN is needed. This has the seri- 
ous downside that GSN users are only able to engage with the subset of the ledger 
state that explicitly acknowledges the GSN network. Beyond reducing the scope of 
GSN transactions, this introduces additional complexity on smart contract devel- 
opment including the fact that participating smart contracts must be pre-loaded 
with funds to pay the GSN intermediaries for their services. 

Motivated by the above, we describe a solution that lifts this second limita- 
tion of custom tokens entirely and without requiring any modification to smart 
contract design. More specifically, we introduce the concept of babel fees, where 
fee payment is possible in any denomination that another party values sufficiently 
to pay the actual transaction fee in the primary currency. Our requirements for 
babel fees go beyond what GSN offers and are summarized as follows: (1) par- 
ticipants that create a babel fee transaction should be able to create a normal 
transaction, which will be included in the ledger exactly as is (i.e., no need for 
meta-transactions or specially crafted smart contract infrastructure) and (2) the 
protocol should be non-interactive in the sense that a single message from the 
creator of a transaction to the participant paying the fee in the primary currency 
should suffice. In other words, we want transaction creation and submission to be 
structurally the same for transactions with babel fees as for regular transactions. 

Our implementation of babel fees is based on a novel ledger mechanism, which 
we call limited liabilities. These are negative token amounts (debt if you like) of 
strictly limited lifetime. Due to the limited lifetime of liabilities, we prevent any 
form of inflation (of the primary currency and of custom tokens). 

Transactions paid for with babel fees simply pay their fees with primary 
currency obtained by way of a liability. This liability is combined with custom 
tokens offered to any party that is willing to cover the liability in exchange for 
receiving the custom tokens. In the first instance, this allows block producers to 


1 https: //docs.opengsn.org/. 
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process transactions with babel fees by combining them with a second fee paying 
transaction that covers the liability and collects the offered custom tokens. More 
generally, more elaborate matching markets can be set up. 

We describe native custom tokens and liabilities in the context of the UTXO 
ledger model. However, our contribution is more general and we sketch in the 
unabridged version [7, Appendix C] how it can be adapted for an account-based 
ledger. In summary, this paper makes the following contributions: 


— We introduce the concept of limited liabilities as a combination of nega- 
tive values in multi-asset token bundles with batched transaction processing 
(Sect. 2). 

— We introduce the concept of babel fees on the basis of limited liabilities as a 
means to pay transaction fees in tokens other than a ledger’s primary currency 
(Sect. 2). 

— We present formal ledger rules for an UTXO multi-asset ledger with limited 
liabilities (Sect. 3). 

— We present a concrete spot market scheme for block producers to match babel 
fees (Sect. 4). 

— We present a solution to the knapsack problem that block producers have to 
solve to maximise their profit in the presence of babel fees (Sect. 5). 


We discuss related work in Sect. 6. 


2 Limited Liabilities in a Multi-asset Ledger 


To realise babel fees by way of liabilities, we require a ledger that supports 
multiple native assets—i.e., a number of tokens accounted for by the ledger’s 
builtin accounting. Moreover, one of these native tokens is the primary currency 
of the ledger. The primary currency is used to pay transaction fees and may 
have other administrative functions, such as staking in a proof-of-stake system. 


2.1 Native Custom Assets 


To illustrate limited liabilities and Babel fees by way of a concrete ledger model, 
we use the UTXOma ledger model [6]—an extension of Bitcoin’s unspent trans- 
action output (UTXO) model to natively support multiple assets.? For refer- 
ence, we list the definitions of that ledger model in the unabridged paper [7, 
Appendix A], with the exception of the ledger rules that we discuss in the fol- 
lowing section. To set the stage, we summarise the main points of the ledger 
model definitions in the following. 

We consider a ledger l to be a list of transactions [t1,...,t,]. Each of these 
transactions consists of a set of inputs is, a list of outputs os, a validity interval 
vi, a forge field value forge, a set of asset policy scripts ps, and a set of signatures 
sigs. Overall, we have 


2 The UTXOma ledger model is in-production use in the Cardano blockchain. 
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t = (inputs : is, outputs : os, validityInterval : vi, 


forge : value forge, scripts : ps, sigs : sigs) 


The inputs refer to outputs of transactions that occur earlier on the ledger— 
we say that the inputs spend those outputs. The outputs, in turn, are pairs of 
addresses and values: (addr : a, value : v), where addr is the hash of the public 
key of the key pair looking that output and value is the token bundle encoding the 
multi-asset value carried by the output. We don’t discuss script-locked outputs 
in this paper, but they can be added exactly as described in [5]. 

Token bundles are, in essence, finite maps that map an asset ID to a 
quantity—i.e., to how many tokens of that asset are present in the bundle in ques- 
tion. The asset ID itself is a pair of a hash of the policy script defining the asset’s 
monetary policy and a token name, but that level of detail has no relevance to 
the discussion at hand. Hence, for all examples, we will simply use a finite map 
of assets or tokens to quantities—e.g., {wBTC > 0.5, MyCoin + 5,nft > 1} 
contains 0.5 wrapped Bitcoin, five MyCoin and one nft. 

The forge field in a transaction specifies a token bundle of minted (positive) 
and burned (negative) tokens. Each asset occurring in the forge field needs to 
have its associated policy script included in the set of policy scripts ps. Moreover, 
the sigs fields contains all signatures signing the transaction. These signatures 
need to be sufficient to unlock all outputs spent by the transaction’s inputs is. 
Finally, the validity interval specifies a time frame (in an abstract unit of ticks 
that is dependent on the length of the ledger) in which the transaction may be 
admitted to the ledger. 

We call the set of all outputs that (1) occur in a transaction in ledger l and 
(2) are not spent by any input of any transaction in l the ledger’s UTXO set—it 
constitutes the ledger’s state. 


2.2 Limited Liabilities 


In a UTXO, the value for a specific token in a token bundle is always positive. In 
other words, the value component of a UTXO is always a composition of assets. 
It cannot include a debt or liability. We propose to locally change that. 


Liabilities. We call a token in a token bundle that has a negative value a 
liability. In other words, for a token bundle value and asset a, if value(a) < 0, 
the bundle value includes an a-liability. 


Transaction Batches. In order to prevent liabilities appearing on the ledger 
proper, we do not allow the state of a fully valid ledger to contain UTXOs 
whose value includes a liability. We do, however, permit the addition of multiple 
transactions at once to a valid ledger, as long as the resulting ledger is again fully 
valid; i.e., it’s UTXO set is again free of liabilities. We call a sequence of multiple 
transactions ts, which are being added to a ledger at once, a transaction batch. 
A transaction batch may include transaction outputs with liabilities as long as 
those liabilities are resolved by subsequent transactions in the same batch. 
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Consider the following batch of two transactions: 


tı = (inputs : is, 
outputs : |(addr : 0, value : {T; +> —5, Tə + 10}), 
(addr : kı, value : {T, + 5})), 
validityInterval : vi, forge : 0, scripts : {}, sigs : sigs) 
to = (inputs : {(outputRef : (t1,0), key : Ø), tr, }, 
outputs : [(addr : k2, value : {T> +> 10})], 
validityInterval : vi, forge : 0, scripts : {}, sigs : sigs’) 


The first output of transaction tı may be spend by anybody (addr = 9). It 
contains both a liability of —5T; and an asset of 107). The second transaction 
t2 spends that single output of tı and has a second input ir,, which we assume 
consumes an output containing 57, which is sufficient to cover the liability. 

Overall, we are left with 57) exposed in ¢,’s second output and locked by «1 
as well as 1072, which t2 exposes in its single output, locked with the key kg. 
Both transactions together take a fully valid ledger to a fully valid ledger as the 
liability is resolved within the transaction batch. 

We have these two facts: (a) we have one transaction resolving the liability 
of another and (b) liabilities are not being permitted in the state of a fully 
valid ledger. Consequently, transaction batches with internal liabilities are either 
added to a ledger as a whole or all transactions in the batch are rejected together. 
This in turn implies that, in a concrete implementation of liabilities in a ledger 
on a blockchain, the transactions included in one batch always need to go into 
the same block. A single block, however, may contain several complete batches. 


Pair Production. Liabilities in batches enable us to create transactions that 
temporarily (i.e., within the batch) inflate the supply of a currency. For example, 
consider a transaction t with two outputs 0; and 02, where o, contains 5000 TT 
and os contains —5000 TT. While value is being preserved, we suddenly do have 
a huge amount of T at our disposal in o1. In loose association with the some- 
how related phenomenon in quantum physics, we call this pair production—the 
creation of balancing positive and negative quantities out of nothing. 

As all liabilities are confined to one batch of transactions only, this does not 
create any risk of inflation on the ledger. However, in some situations, it can still 
be problematic as it may violate invariants that an asset’s policy script tries to 
enforce. For example, imagine that T is a role token [5] - i.e., a non-fungible, 
unique token that we use to represent the capability to engage with a contract. In 
that case, we surely do not want to support the creation of additional instances 
of the role token, not even temporarily. 

In other words, whether to permit pair production or not depends on the 
asset policy of the produced token. Hence, we will require in the formal ledger 
rules, discussed in Sect.3, that transactions producing a token T always engage 
T’s asset policy to validate the legitimacy of the pair production. 
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2.3 Babel Fees 


Now, we are finally in a position to explain the concrete mechanism underlying 
babel fees. The basic idea is simple: assume a transaction t that attracts a fee of 
x C (where C is the ledger’s primary currency), which we would like to pay in 
custom currency T. We add an additional babel fee output opaper with a liability 
to t : Obabel = {C > —x,T + y}. This output indicates that we are willing to 
pay y T to anybody who pays the x C in return. Hence, anybody who consumes 
Obabel Will receive the y T, but will at the same time have to compensate the 
liability of —x C. The two are indivisibly connected through the token bundle. 
Thus, we may view a token bundle that combines a liability with an asset as a 
representation of an atomic swap. 

The transaction t can, due to the liability, never be included in the ledger 
all by itself. The liability —x C does, however, make a surplus of x C available 
inside t to cover t’s transaction fees. 

To include t in the ledger, we need a counterparty to whom y T is worth at 
least x C. That counterparty batches t with a fee paying transaction tree that 
consumes Obabel- In addition, tyee will have to have another input from which it 
derives the x C together with its own transaction fee, all out of the counterparty’s 
assets. The transaction tfee puts the y T, by itself, into an unencumbered output 
for subsequent use by the counterparty. Finally, the counterparty combines t and 
tree into a transaction batch for inclusion into the ledger. 

In Sect. 4, we will outline a scheme based on Babel fees and fee paying trans- 
actions, where block producing nodes act as fee paying counterparties for trans- 
actions that offer Babel fees in the form of custom tokens that are valuable to 
those block producer. They do so, on the fly, in the process of block production. 


2.4 Other Uses Liabilities and Liabilities on Account-Based Ledgers 


Due to space constraints, we relegate a discussion of other uses of liabilities to 
the unabridged paper [7, Appendix B], which, in Appendix C, also describes how 
limited liabilities can be realised on an account-based ledger. 


3 Formal Ledger Rules for Limited Liabilities 


In this section, we formalise the concept of limited liabilities by building on the 
UTXOma ledger; i.e., the UTXO ledger with custom native tokens as introduced 
in existing work [6]. To add support for limited liabilities, we modify the ledger 
rules in three ways: 


1. The original UTXOma rules are defining ledger validity by adding transactions 
to the ledger one by one. We extend this by including the ability to add 
transactions in batches; i.e., multiple transactions at once. 

2. We drop the unconditional per-transaction ban on negative values in trans- 
action outputs and replace it by the weaker requirement that there remain 
no negative values at the fringe of a batch of transactions. In other words, 
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liabilities are confined to occur inside a batch and are forced to be resolved 
internally in the batch where they are created. 

3. We amend the rules about the use of policy scripts such that the script of a 
token T is guaranteed to be run in every transaction that increases the supply 
of T. 


In this context, the supply of a token T in a given transaction t is the amount 
of T that is available to be locked by outputs of t. If that supply is larger than 
the amount of T that is consumed by all inputs of t taken together, then we 
regard ¢ as increasing the supply. This may be due to forging T or due to pair 
production (as discussed in Sect. 2.2). 


3.1 Validity 


In the original UTXOma ledger rules, we extend a ledger l with one transaction 
t at a time. In the UTXOy, ledger rules (UTXOma with limited liabilities), we 
change that to add transactions in a two stage process that supports the addition 
of batches of transactions ts with internal liabilities: 


1. We modify the definition of the validity of a transaction t in a ledger l from 
UTXOma, such that it gives us conditional validity of t in l for UTXOy as 
defined in Fig. 1. 

2. We define validity of a batch of one or more transactions ts by way of the 
conditional validity of the individual t € ts together with the batch validity 
of ts in ledger J. 


We describe the details of these two stages in the following. 


3.2 Stage 1: Conditional Validity 


Conditional validity in UTXOy is defined very much like full validity in UTXOma. 
Figure 1 defines the conditions for transactions and ledgers to be conditionally 
valid, which are mutually dependent. The definitions in Figs. 1 and 2 are based 
on the ledger formalisation introduced for UTXOma [6]. We do not repeat this 
formalisation here to favour conciseness, but summarise it in the unabridged 
paper [7, Appendix A]. 


Definition 1 (Conditional validity of transactions and ledgers). A trans- 
action t € Tx is conditionally valid for a conditionally valid ledger | € Ledger 
during tick currentTick if t abides by the conditional validity rules of Fig. 1, using 
the auxiliary functions summarised in Fig. 2. 

A ledger l € Ledger, in turn, is conditionally valid if either l is empty or L is 
of the form t:l! with I’ being a conditionally valid ledger and t being conditionally 
valid for l. 


Figure 1 highlights the two changes that we are making to the UTXOma 
rules: firstly, we struck out Rule (2), and secondly, we changed Rule (8) in two 
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1. The current tick is within the validity interval 
current Tick € t.validityInterval 
2. AH eutputs_have nen-negative-valies 
For-alle<teutputs—evelre SO 


3. All inputs refer to unspent outputs 
{i.outputRef : i€ t.inputs} C unspentOutputs(Z). 
4. Value is preserved 
t.forge + 5 getSpentOutput(i, l) = 5 o.value 
i€t. inputs o€t.outputs 

5. No output is locally double spent 

If i1,7 € t.inputs and %1.outputRef = i.outputRef then i; = i. 
6. All inputs validate 

For all 7 € t.inputs, there exists sig € t.sigs, verify(i.key, sig, txld(t)) 

7. Validator scripts match output addresses 

For all i € t.inputs, keyAddr(i.key) = getSpentOutput(?, /).addr 


8. Forging 
@ A transaction which changes the supply —i.e., changedSupply(¢, 1) 4 {}— is only 
valid if either: 
(a) the ledger l is empty (that is, if it is the initial transaction). 
(b) @ for every policy ID h € changedSupply(t, l), there exists s € t.scripts with 
h = scriptAddr(s). 
9. All scripts validate 


For all s € t.scripts, 
[s] (scriptAddr(s), t, {getSpentOutput(z,/) | i © t.inputs}) = true 


Fig. 1. Conditional validity of a transaction t in a ledger | permitting liabilities 


places marked with 4. The removal of Rule (2) permits liabilities in the first 
place. Outputs may now contain negative values and, if they do, the associated 
transaction is merely conditionally valid. Full validity is now conditional on 
resolving all liabilities from other transactions that are added in the same batch. 

Moreover, the change to Rule (8) ensures that transactions that change the 
supply of a token under a policy s with script address h do run the policy script 
s, regardless of whether the change in supply is due to a non-empty forge field 
t.forge or due to pair production. In either case, the script is guaranteed an 
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— output references provided by a transaction 
unspentTxOutputs : Tx — Set{OutputRef] 
unspentTxOutputs(t) = {(txld(¢),1),..., (txid (id), |t.outputs|) } 


— a ledger’s UTXO set 

unspentOutputs : Ledger — Set[OutputRef| 

unspentOutputs([]) = {} 

unspentOutputs(¢ :: 2) = (unspentOutputs(l) \ ¢.inputs) U unspentTxOutputs(t) 


— the outputs spent by the given set of transaction inputs 
getSpentOutput : Input x Ledger — Output 
getSpentOutput(i,/) = llookupTx(I, i.outputRef .id).outputs|t.outputRef index] 


— policy IDs of assets whose amount varies 
policiesWithChange : Quantities x Quantities — Set[PolicylD] 
policiesWithChange(val;, val2) = {a.pid | a E€ supp(val; — valz)} 


— policy IDs whose supply changed in the transaction 
changedSupply : Tx x Ledger — Set[PolicylD] 
changedSupply(t,/) = 
policiesWithChange() > 
policiesWithChange()_ 
where 
value* (a) = if value(a) > 0 then value(a) else 0 
value” (a) = if value(a) < 0 then value(a) else 0 


o.valuet) U 
o.value” ) 


o.value* , > 
o.value” , >> 


o€getSpentOutput(t.inputs) o€t.outputs 


o€getSpentOutput(t.inputs) o€t.outputs 


Fig. 2. Auxiliary validation functions 


opportunity to validate that the increase in supply abides by the rules enforced 
by the token policy. In other words, transactions that contain supply changes 
that violate the associated token policy are guaranteed to be rejected. 


Changed Supply. The change in supply is computed with the help of the func- 
tion changedSupply(¢, l) (defined in Fig. 2) that, for a given ledger l, determines 
all policy script hashes h that control an asset whose supply is changed by the 
transaction t. Such a change may be due to the minting or burning of assets 
in the transactions forge field t.forge or it may be due to pair production, as 
discussed in Sect. 2.2. The function changedSupply spots supply changes by com- 
paring the quantity of assets and asset liabilities in the inputs and outputs of a 
transaction. It uses the helper functions value* and value” to filter all positive 
(assets) and negative (liabilities), respectively, out of a token bundle. 


Script Validation. Rule (8) uses the set of hashes of policy scripts computed 
by changedSupply to check that all the corresponding scripts are included in the 
t.scripts field. The scripts in t.scripts are exactly those that Rule (9) executes. 
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Note that the primary currency of the ledger may require a special case in 
this rule. The total supply of the primary currency may be constant as part of 
the ledger implementation, and therefore its minting policy will always fail to 
validate, even in the case of producing and consuming transient debt. This may 
be addressed in (among others) one of the following ways: either modify the 
policy to specifically allow pair production of the primary currency, or modify 
this rule to not check the primary currency policy at all. 


3.3 Stage 2: Batch Validity 


For a ledger to be valid, we require that it is conditionally valid and that its 
state (i.e., the set of unspent outputs) does not contain any negative quantities. 


Definition 2 (Ledger validity). A ledger | : Ledger is (fully) valid if l is 
conditionally valid and also, for all, o E€ unspentOutputs(l), o.value > 0. 


On that basis, we define the validity of a batch of transactions ts for a valid 
ledger J. 


Definition 3 (Validity of a batch of transactions). A batch of transactions 
ts : List{Tx] is (fully) valid for a valid ledger l : Ledger if ts ++ 1 is a fully valid 
ledger. 


4 Implementing Babel Fees 


In this section, we describe a concrete spot market, where users can exchange 
custom tokens via the babel fees mechanism described in Sect. 2.3. This spot 
market comprises a set of sellers S = {81, 82,...,8n} and a set of buyers? B = 
{b1, b2, ...,0m}. Sellers sell bundles of custom tokens to buyers, who in return 
provide primary tokens to cover the fees incurred by the transactions submitted 
by the sellers to the network. 


4.1 Babel Offers 


In this context, a transaction with a babel fee output (as per Sect. 2.3) essentially 
constitutes an offer—specifically, the offer to obtain a specified amount of custom 
tokens by paying the liability in primary tokens included in the babel fee output. 
We define such offers as follows. 


Definition 4. We define a babel offer to be a tuple of the form: 


LE Ttia, TName, TAmount, Liability) 


where T ziq is a unique identifier of the transaction containing the babel fee out- 
put, TName is a string corresponding to the name of a custom token, T Amount 


3 Buyers in this market are the block issuers of the blockchain. 
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is a positive integer E€ Z* corresponding to the amount of tokens offered and 
Liability is a negative integer E€ Z corresponding to the amount in primary 
tokens that has to be paid for obtaining the tokens. 


Sellers produce such babel offers, which are then published to the network and 
are visible to all buyers. 


4.2 Exchange Rates 


In our model, we assume that the spot market of babel offers operates in distinct 
rounds.“ In every round, a buyer is selected from the set B at random. The 
selected buyer has the opportunity to accept some of the outstanding offers by 
paying the corresponding liabilities. The rational buyer chooses the offers that 
maximise her utility function, which we elaborate in Sect. 5. 

In order to help sellers to make attractive offers, we assume that every buyer 
i, i =1,2,...,m publishes a list L,[(Zj,X R;)] of exchange rates XR, for every 
exchangeable custom token T;, j = 1,...,k. The list of exchange rates from all 
buyers BL{i] = Lı, i = 1,...,m is available to all sellers s € S. Note that the 
buyer can set X Rj = +00 if they don’t accept the token. 

Given a specific babel offer g = (tg, (token, amount,, liability,)) offering 
an amount of a custom token token,, and assuming that there is only a single 
buyer b with a published exchange rate for token, equal to X R4 = aaa eae 
an attractive offer should adhere the inequality: amount, > |liability,|X Ra. 
Naturally, an offer gets more attractive to the degree that excess tokens are 
offered over the minimum needed to meet the exchange rate for the liability. 


4.3 Coverage 


To generalise to the case where m possible buyers express an interest in tokeng, 
we need to consider the following question: how many token, does a seller need 
to offer to ensure that P% buyers consider the offer attractive? 

The seller has to choose the cheapest P; percentile from the available 
exchange rates listed for tokena, which by definition is satisfied by an effec- 
tive exchange rate that is greater than P% of the published exchange rates. In 
other words, for the offer g from above to be attractive to P% of buyers, the 
seller needs to choose the amount for token, as follows: 


amount, > |liability,|percentile(P, tokens, BL) (1) 


where percentile(P, tokena, BL) is the lowest exchange rate for tokena, such 
that it is still greater than P% of the exchange rates listed for that token in the 
exchange rate table BL. In this case, we say that the offer g has P% coverage. 

For example, assume a liability of 0.16 primary tokens and a set of 10 
buyers with the following published exchange rates for tokeng, BLtoxen, = 
{1.63, 1.38, 3.00, 1.78, 2.00, 1.81}. If a seller wants to ensure that more than 70% 


4 In practice this can be the block-issuing rounds. 
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of the buyers will consider her offer, she computes the 70th percentile of the 
exchange rates, which is 2.00. Thus, the seller knows that she has to offer at 
least 0.16 x 2.00 = 0.32 of token,. 


4.4 Liveness 


Consider a babel offer that is published to the network and assume that there 
is at least one party b; (buyer) that is attracted by this offer. The interested 
party will then create a transaction batch tza (see Sect. 2.2) that covers the 
liability and will publish it to the network with the expectation that this will 
(eventually) be included into a block and be published in the ledger implemented 
by the blockchain. Therefore, it is crucial to ensure censorship resilience for our 
Babel offers and show that our spot market for Babel offers enjoys the property 
of liveness [10]. 

If b; is selected as a block issuer, then she will include the transaction batch 
in the block she will create and thus liveness is preserved. However, if b; is never 
selected as a block issuer (or is selected with a very low probability), then we must 
ensure the accepted offer will eventually be included into the blockchain. In the 
following analysis, we distinguish between two cases: a) The case where all buy- 
ers are acting rationally (but not maliciously) and b) the case where a percent of 
the buyers are controlled by a malicious adversary party. Our detailed analysis is 
presented in the paper’s unabridged version [7, Appendix D] and has shown that 
our spot market indeed enjoys liveness, if the buyers are rational players trying to 
maximize their profit. Moreover, in the case of adversary players, if honest major- 
ity holds and a Babel offer attracts at least one honest player, then the accepted 
offer will be (eventually) published in the blockchain and thus liveness is preserved. 


5 Transaction Selection for Block Issuers 


A block issuer constructs a block of transactions by choosing from a set of 
available transactions called the mempool. A rational block issuer tries to max- 
imize her utility. In our case, we assume that this utility is a value, corre- 
sponding to the amount of primary currency earned by this block. These earn- 
ings come from the transaction fees paid either in primary currency or cus- 
tom tokens. Hence we assume the existence of a utility function of the form: 
utility::CandidateBlock — Value, where CandidateBlock is a list of transac- 


tions CandidaeBiock E rast \Candidatareansaction and Value is an amount 
€ Z+ of primary currency at the lowest denomination. 


5.1 The Value of Babel Offers 


A candidate transaction residing in the mempool and waiting to be included in 
a block can be either a (single) transaction or a transaction batch (see Sect. 2.2). 
In the following, we define the concept of a candidate transaction: 


Definition 5. A candidate transaction residing in the mempool is defined as 
quadruple: 
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Candidate Tonino Taxa, Value, Liability, Size) 


where Txig is a unique identifier of a transaction (or a transaction batch) in the 
mempool, Value for the case of transactions corresponds to the transaction fees 
expressed in the primary currency, while for the case of transaction batches, it cor- 
responds to the total value of the obtained custom tokens expressed as an amount 
in the primary currency. In the case of transaction batches, Liability E€ Z~ is the 
amount expressed in the primary currency that has to be paid for covering this lia- 
bility. In the case of transactions, it equals zero. Finally, Size is the total size of the 
transaction, or the transaction batch as a whole, expressed in bytes. 


We assume the existence of a function that can transform a Babel offer 
(Definition 4) into a candidate transaction batch: batchVal::BabelOf fer — 
CandidateTransaction. We need this function in order to be able to express the 
value of the obtained custom tokens in primary currency, so that Babel offers 
are comparable to the transaction fees of conventional transactions. Any such 
conversion function might be chosen by the block issuer based on her business 
logic of how to evaluate a specific offer. In particular, one reasonable approach 
to defining the conversion function is the following: 


inal Val 
Value = 5 TAmount Orange nominalVal 


een PORT |Liability per token| 


(TAmount x nominal Val)? 
| Liability| 


Vtoken€ BabelOffer 


The nominal value of the token, nominalVal, is essentially the current rate 
primary currency: i.e., it expresses what amount of primary currency one custom 
token is worth. Therefore, if the exchange rate between a custom token T and 
the primary currency A is 3:1, then nominalVal = 0.33A. Of course, this rate is 
dynamic and it is determined by market forces just like with fiat currencies and 
Bitcoin fees. We assume that this information is available to the block producer, 
when they need to select candidate transactions from the mempool to include in 
a new block. In fact, block issuers can publish exchange rates for specific tokens 
they consider acceptable (as discussed in see Sect. 4). Intuitively, the higher the 
nominal value, the more valuable the token is to the block issuer. 

Hence, whenever a block issuer tries to assemble a block they face the fol- 
lowing optimization problem: 


Definition 6. The transaction selection problem TxSelection(n, Sp, M, R) is 
the problem of filling a candidate block of size Sp, with a subset Bn C M 
of n available candidate transactions M = {ta1,txo,...,tan}, where we use 
B, C {1,2,..,n}, without spending more than a reserve R of available pri- 
mary currency on liabilities, in such a way that utility(B,) > utility(B}) V 
block Bi, C M. Every candidate transaction ta; = (i, vi, li, si), for i = 1,...,n 
is defined according to Definition 5 and has a fixed liability l; and size si in 
bytes. We assume that the value of a candidate transaction that corresponds 
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to a Babel offer is not fixed; instead, it decreases (just as its desirability) as 
we select candidate transactions offering the same custom token for the block. 
Thus, the value vi of a candidate transaction is expressed as a function of what 
has already been selected for the block, v;(Bi-1) : CandidateBlock — Value, 
where By, C {1,2,...,i — 1} and v;(0) = vio is the initial value of the offer 
and 0 < v;(Bi-1) < voi. Finally, the utility function that we want to maxi- 
mize is defined as utility = Daien, vi(Bi—ı), where Bi—ı is the solution to the 
TxSelection(i — 1, Sg — Da sj, M — {i,... n}, R — ae lj) problem. 


5.2 Dynamic Programming 


We start with the presentation of an optimal solution to the transaction selection 
problem. It is a variation of the dynamic programming solution to the 0-1 knap- 
sack problem [9]. It is important to note that we want conventional transactions 
and transaction batch offers to be comparable only with respect to the value 
offered and their size. We do not want to view liability as another constraint to 
the knapsack problem, because this would favor zero liability candidate trans- 
actions (i.e., conventional transactions) over Babel offers. The liability aspect 
of the offer has already been considered in the value calculation of the conver- 
sion function from a BabelOffer to a CandidateTransaction, as shown in the 
indicative conversion formula above. 


5.3 Optimal Algorithm of the Transaction Selection Problem 


The optimal algorithm presented in Algorithm 1 proceeds as follows. Initially, 
we order the candidate transactions of M in descending order of their (initial) 
value per size ratio Vio/si, i = 1,2,...n. We maintain an array U/i],i = 1,2,...n. 
Each entry Ufi] is a list of tuples of the form (t,,t,,r,b). A tuple (ts, ty, 7, b) 
in the list Ufi] indicates that there is a block B assembled from the first i 
candidate transactions that uses space exactly ts < Spg, has a total value exactly 
utility(B) = ty < 37", voi, has a residual amount of primary currency to be 
spent on liabilities exactly r < R and has a participation bit b indicating if 
transaction 7 is included in B, or not. 

This list does not contain all possible such tuples, but instead keeps track 
of only the most efficient ones. To do this, we introduce the notion of one 
tuple dominating another one; a tuple (ts,tv,r,b) dominates another tuple 
(ttr, b), if ts < t and ty > t; that is, the solution indicated by the 


tuple (ts, tu, r, b) uses no more space than (t4, t, r’, b’), but has at least as much 


8s) "uv? 


value. Note that domination is a transitive property; that is, if (ts, tu, r, b) dom- 
inates (t,,t/,,r’,b’) and (t4,t,,r’,b’) dominates (t,t, r”, b”), then (ts, tv, 7, 0) 
also dominates (t,t), r’,b!’). We will ensure that in any list, no tuple domi- 
nates another one; this means that we can assume each list U|i] is of the form 
[(ts1, tv1, 71, b1), ..., (tsk; tuk, Tk, bk )| with tsı < tsa < ... < tsk and tyi < tv2 < 
. < tyk. Since every list U[i],i = 1,2,...,n does not include dominating tuples 
and also the sizes of the transactions are integers and so are their values, then 
we can see that the maximum length of such a list is min(SB + 1, Və + 1), where 


V = Jaai Voi- 
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Algorithm 1 starts out with the initialization of list U[1] (line 2) and then 
iterates through all n—1 transactions (lines 3-10). In each iteration j, we initially 
set U[j] — Uj — 1] after turning off the participation bit in all tuples (lines 
4-5). Then for each tuple (ts,t,,7,b) € U[j — 1], we also add the tuple (ts + 
Sj, ty + v;(B;-1),r — lj, 1) to the list, if ts +s; < Sg Ar -— lj > R; that is, if by 
adding transaction j to the corresponding subset, we do not surpass the total 
available size Sp and do not deplete our reserve R for liabilities (lines 6-9). Note 
that the value of transaction j at this point is determined by the contents of 
the corresponding block Bj—ı through the function call v;(B;—1). To this end, 
in lines 14-22 we provide a function that returns the block corresponding to 
a specific tuple. We finally remove from U|j] all dominated tuples by sorting 
the list with respect to their space component, retaining the best value for each 
space total possible, and removing any larger space total that does not have a 
corresponding larger value (line 10). We return the maximum total value from 
the list U[n] along with the corresponding block B,, (lines 11-13). 


Algorithm 1: Transaction selection algorithm for a block (Optimal Solu- 
tion). 


Input: A set M of candidate transactions M = {tx1,two,...,t@n}, where 
ta; = (i, vi(Bi—1), li, si) for i = 1,...,n according to Definition 5 
Input: An amount of primary currency available for covering liabilities, called the reserve R. 
Input: An available block size Sz 
Input: A utility function util = doje, vi(Bi-1) 
Output: (B, util(B), res): A candidate block B C M such that 
util(B) > util(B’)VB’ C M, the value of this block (util(B)) and a residual 
amount res from the reserve R such that res > 0 
/* Assume array U[i]: Array[List[(Size, Value, Liability, Bit)]], i=1,...n */ 
order transactions in M in descending order of v;,/s;, i = 1, 2...,n 
U[1] t= [(0, 0, R,0), (s1, Vio; R- ly, 1)] 
for j = 2 ton do 
baseList —— copy list U[j — 1] with zero participation bits for all tuples 
U[j] — baseList 
foreach (t,,t,,7,b) E baseList do 
if ts +s; <SpAr-—l,; > R then 
Bj—ı — getBlock(U, j — 1, ts) 
Add tuple (ts + sj, to + v;(Bj-1),r — lj, 1) to U[j] 
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Remove dominating pairs from list U [j] 
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m 


(S final; Vmax, residual, b) — MaT(s,v,r)eU[n] (Y) 
Bn — getBlock(U, n, Sfinal) 
return (Bn, Vmax, residual) 
A E a a 
getBlock(U: Array[List[(Size, Value, Liability, Bit)]], n:T£ia, tsn: Size) return 
CandidateBlock 
B—] 
ts <— tsn 
for i = n down to 1 do 

(tsi, tui, ri, bi) —— getTuple(U[i], ts) 

if b; == 1 then 

B—i:B // ":" is list construction 


ts — ts — tsi 
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Theorem 1. Algorithm 1 correctly computes the optimal value for the transac- 
tion selection problem. 


The proof of the theorem is contained in the unabridged paper [7, Appendix E]. 


5.4 Polynomial Approximation 


Since we iterate through all available n transactions and in each iteration we 
process a list of length min(Sp + 1, Və + 1), where Vo = )>;_, voi, we can see 
that Algorithm 1 takes O(n min(Sp, V,)) time. This is not a polynomial-time 
algorithm, since we assume that all input numbers are encoded in binary; thus, 
the size of the input number Spg is essentially log2Sg, and so the running time 
O(nSzg) is exponential in the size of the input number Spg, not polynomial. Based 
on the intuition that if the maximum value V, was bounded by a polynomial in 
n, the running time will indeed be a polynomial in the input size, we now propose 
an approximation algorithm for the transaction selection problem that runs in 
polynomial time and is based on a well-known fully polynomial approximation 
scheme of the 0-1 knapsack problem [13]. 

The basic intuition of the approximation algorithm is that if we round the 
(integer) values of the candidate transactions to v;(Bi-1) = |vi(Bi-1)/p|, where 
0 < vi(Bi-1) < [vio/u] = vi, and run Algorithm 1 with values v/ instead 
of vi, then by an appropriate selection of u, we could bound the maximum 
value V! = Soi, vi, by a polynomial in n and return a solution that is at 
least (1 — €) times the value of the optimal solution (OPT). In particular, if 
we choose u = €Vomax/N, Where Vomax is the maximum value of a transac- 
tion; that is, Vomar = MaXjem (Voi). Then, for the total maximum value VJ, 
we have Vi = Diu, = Via ae O(n?/e). Thus, the running 
time of the algorithm is O(n min(Sp,V/)) = O(n3/e) and is bounded by a 
polynomial in 1/e. Algorithm 2 contains our approximate algorithm for the 
transaction selection problem. Essentially, we run Algorithm 1 for the prob- 
lem instance TxSelection(n,Sp,M’,R), where M’ = {tx1, txh, ..., tx}, and 
tx; = (i, v}(Bi_1), li, si) for i = 1,...,n We can now prove that this algorithm 
returns a solution whose value is at least (1 — €) times the value of the optimal 
solution. 


Theorem 2. Algorithm 2 provides a solution which is at least (1 — €) times the 
value of OPT. 


The proof of the theorem is in the unabridged paper [7, Appendix F]. 


6 Related Work 


Babel fees are enabled by swap outputs based on limited-lifetime liabilities. 
These swaps, once being proposed (as part of a complete transaction), can be 
resolved unilaterally by the second party accepting the swap as elaborated in 
the unabridged paper [7, Appendix B]. 
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Algorithm 2: Transaction selection algorithm for a block (Approximate 
Solution). 


Input: A set M of candidate transactions M = {tx1,txo,...,t@n}, where 
ta; = (i, vi(Bi—1), li, si) for i = 1,...,n according to Definition 5 

Input: An amount of primary currency available for covering liabilities, called the reserve R. 

Input: An available block size Sz 

Input: A utility function util = } ic g, vi(Bi-1) 

Input: The acceptable error e from the optimal solution, where 0 < e < 1 

Output: (B, util(B), res): A candidate block B such that util(B) > util(B’)VB’ C M, the 
value of this block (util(B)) and a residual amount res from the reserve R such 
that res > 0 

Vomax — maziem (Voi) 

Hl — Vomax/N 

v;,(Bi-1) — |vi(Bi-1)/u| for i =1,2,...,n 

run Algorithm 1 for the problem instance TxSelection(n, Sg, M’, R), where 

M’ = {tx}, tx,..., tv}, and ta’ = (i, v;(Bi-1),li, si) fori=1,...,n 


BO Ne 


Atomic Swaps and Collateralized Loans. Atomic swaps (which may be 
used to pay for fees) often go via an exchange, including for Ethereum ERC- 
20 tokens [15] and Waves’ custom natives [22], as well as multi-blockchain 
exchanges based on atomic swaps [12,14]. These exchanges come in varying 
degrees of decentralisation. Atomic swaps are also used for swapping or auc- 
tioning assets across chains [11,16]. Our proposal is fully decentralized and 
single-chain. It allows transactions carrying swap or fee-coverage offers to be 
disseminated directly via the blockchain network (because they are fully-formed 
transactions), without any off-chain communication. 

A notable difference between our swap mechanism and some layer-2 DEX 
solutions, such as Ethereum’s Uniswap [21] and SwapDEX [19], is that these 
require proof of liquidity (i.e. assets locked in a contract), as well as contract- 
fixed exchange rates. Our proposal enables users to accept the optimal number of 
exchange offers without an obligation to have liquidity or to accept them. Users 
are also free to choose and change their exchange rates at any time, without 
on-chain actions. 

Our limited-lifetime liabilities are a sort of loan, but one that is resolved 
before it is even recorded on the ledger. There is also work on ledger-based 
loans [4,20], but this leads to rather different challenges and mechanisms. In 
particular, the liabilities we propose do not require collateral backing (as they 
are resolved within a single batch). Moreover, unlike either atomic swaps or 
collateralized loans, our mechanism requires no actions from the user after sub- 
mitting a swap offer or fee-less transaction to the network. 

These mechanisms, while having some capacity to address some of the same 
shortcomings as the babel fees mechanism, are usually a combination of off- 
chain solutions and layer-2 (via smart contracts) are quite different from the 
single-chain, ledger-integrated proposal we provided. 


Child Pays for Parent. The UTXO model enforces a partial ordering on 
transactions that can be taken advantage of to encourage block producers to 
include less desirable (smaller-fee) transactions in a block by also disseminating 
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a higher-fee transaction that depends on the undesirable transaction. This is 
known as a child-pays-for-parent technique [17]. Like ours, it deals with fully 
formed transactions, and requires no further input from the author of the small- 
fee transaction. The solution we propose, however, is geared towards a ledger 
model where transaction validation rules enforce a minimum fee, so any trans- 
action that does not pay it (via liabilities or directly) will be rejected regardless 
of whether a high-fee transaction depends on it. 


Ethereum. Ethereum’s Gas Station Network (GSN) [1] infrastructure con- 
sists of (a) a network of nodes listening for meta-transactions (transaction-like 
requests to cover transaction fees), which turn these requests into complete trans- 
actions, with fees covered by the relay node, and (b) an interface that contracts 
must implement in order for the relay nodes to use this contract’s funds to 
subsidize the transaction fees. 

Babel fees are simpler as they don’t require the following (all of which the 
GSN relies on): (1) disseminating of partially formed (meta-)transactions on a 
separate network, (2) adding infrastructure, such as relays, relay hubs, and a 
separate communication network, (3) any changes to smart contracts to allow 
them to participate, (4) submitting transactions to make or update fee-covering 
or exchange offers, (5) any further action from the user after submitting a trans- 
action that requires its fees to be covered, and (6) pre-paying for the fee amounts 
contracts are able to cover. 

Another solution for processing transactions without any primary currency 
included to cover fees, called Etherless Ethereum Tokens, is proposed in [3]. This 
approach includes a formal composability framework (including formal proofs 
of important properties), requires notably less gas consumption, and offers a 
much more seamless user experience than the GSN. However, it still relies on 
the off-chain dissemination of meta-transactions, and requires changes to smart 
contracts to opt in to participation, as well as fix an exchange rate. 


Algorand. Algorand is an account-based cryptocurrency which supports cus- 
tom native tokens. It provides users with a way to perform atomic transfers (see 
[2]). An atomic transfer requires combining unsigned transactions into a single 
group transaction, which must then be signed by each of the participants of each 
of the transactions included. This design allows users to perform, in particular, 
atomic swaps, which might be used to pay fees in non-primary currencies. 

As with our design, the transactions get included into the ledger in batches. 
Unlike Babel fees, however, incomplete transactions cannot be sent off to be 
included in the ledger without any further involvement of the transaction author. 


Debt Representation in UTXO Blockchains. There are similarities 
between the debt representation proposal presented in [8] and the mechanism 
we propose, the main one being the idea of representing debt as special inputs 
on an UTXO ledger. Unlike the debt model we propose, the model presented 
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in that paper allows debt to be recorded in a persistent way on the ledger. As 
we prevent liabilities to ever enter the ledger state, we side step the main issues 
discussed in [8], including the need for managing permissions for issuing debt on 
the ledger, and therefore also for the trust users may be obligated to place in 
the debt issuer, and vice-versa. The possibility of unresolved debt remaining on 
the ledger (and therefore inflation) is a concern that needs to be taken seriously 
in this case. 

Debt recorded on the ledger state (and outside a transaction batch) enables 
functionality that we cannot support with limited liabilities. Moreover, if a debt- 
creating transaction is complete and ready to be applied to the ledger, all nodes 
are able to explicitly determine the validity of this transaction. This way, these 
transaction can be relayed by the existing network, without any special consid- 
eration for their potential to be included in a batch, and by who. 

Another key difference between the two proposals is that ours assumes an 
underlying multi-asset ledger, so that the debt-outputs have another major 
interpretation—they also serve as offers for custom token fee coverage, as well 
as swaps. Finally, the ledger we propose treats debt outputs and inputs in a 
uniform way, rather than in terms of special debt transactions and debt pools, 
which result in potentially complicated special cases. 


Stellar DEX. The Stellar system [18] supports a native, ledger-implemented 
DEX to provide swap functionality (and therefore, custom token fee payment). 

In the Stellar DEX, offers posted by users are stored on the ledger. A trans- 
action may attempt an exchange of any asset for any other asset, and will fail 
if this exchange is not offered. This approach requires submitting transactions 
to manage a user’s on-chain offers, and also requires all exchanges to be exact— 
which means no overpaying is possible to get one’s bid selected. A transaction 
may attempt to exchange assets that are not explicitly listed as offers in exchange 
for each other on the DEX. The DEX, in this case, is searched for a multi-step 
path to exchanging these assets via intermediate offers. This is not easily doable 
using the approach we have presented. 

A DEX of this nature is susceptible to front-running. In our case, block 
issuers are given a permanent advantage in resolving liability transactions over 
non-block-issuing users. Among them, however, exactly one may issue the next 
block, including the liabilities they resolved. 
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Abstract. Sealed-bid auctions are a common way of allocating an asset 
among a set of parties but require trusting an auctioneer who analyses 
the bids and determines the winner. Many privacy-preserving computa- 
tion protocols for auctions have been proposed to eliminate the need for a 
trusted third party. However, they lack fairness, meaning that the adver- 
sary learns the outcome of the auction before honest parties and may 
choose to make the protocol fail without suffering any consequences. In 
this work, we propose efficient protocols for both first and second-price 
sealed-bid auctions with fairness against rational adversaries, leveraging 
secret cryptocurrency transactions and public smart contracts. In our 
approach, the bidders jointly compute the winner of the auction while pre- 
serving the privacy of losing bids and ensuring that cheaters are financially 
punished by losing a secret collateral deposit. We guarantee that it is never 
profitable for rational adversaries to cheat by making the deposit equal 
to the bid plus the cost of running the protocol, i.e., once a party com- 
mits to a bid, it is guaranteed that it has the funds and it cannot walk 
away from the protocol without forfeiting the bid. Moreover, our proto- 
cols ensure that the winner is determined and the auction payments are 
completed even if the adversary misbehaves so that it cannot force the 
protocol to fail and then rejoin the auction with an adjusted bid. In com- 
parison to the state-of-the-art, our constructions are both more efficient 
and furthermore achieve stronger security properties, i.e., fairness. Inter- 
estingly, we show how the second-price can be computed with a minimal 
increase of the complexity of the simpler first-price case. Moreover, in case 
there is no cheating, only collateral deposit and refund transactions must 
be sent to the smart contract, significantly saving on-chain storage. 
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1 Introduction 


Auctions are a common way of allocating goods or services among a set of parties 
based on their bids, e.g., bandwidth spectrum, antiques, paintings, and slots for 
advertisements in the context of web search engines or social networks [17]. In the 
simplest form, there is a single indivisible object, and each bidder has a private 
valuation for the object. One of the main desirable properties in designing an 
auction is incentive compatibility, that is the auction must be designed in a way 
that the participating parties can maximize their expected utilities by bidding 
their true valuations of the object. According to design, the auction can be 
categorized into open auctions, and sealed-bid auctions [31]. 

We focus on the case of sealed-bid auctions, constructing protocols where 
parties holding a private bid do not have to rely on trusted third parties to 
ensure bid privacy. In a sealed bid auction, each bidder communicates her bid to 
the auctioneer privately. Then, the auctioneer is expected to declare the highest 
bidder as the winner and not to disclose the losing bids. In particular, in the 
sealed-bid first-price auction, the bidder submitting the highest bid wins the 
auction and pays what she bids, while in the sealed-bid second-price auction (i.e., 
the Vickrey auction [41]) the bidder submitting the highest bid wins the auction 
but pays the amount of the second-highest bid [30]. It is well-known that in the 
second-price auctions bidding truthfully is a dominant strategy, but no dominant 
strategy exists in the case of first-price auctions. Moreover, while in both first- 
price and second-price auctions, a dishonest auctioneer may disclose the losing 
bids, the second-price auction, in particular, highly depends on trusting the 
auctioneer. Indeed, a dishonest auctioneer may substitute the second-highest 
bid with a bid that is slightly smaller than the first bid to increase her revenue. 
Therefore, it may not be possible or expensive to apply it in certain scenarios. As 
aresult, constructing cryptographic protocols for auctioneer-free and transparent 
auction solutions is of great interest. 


1.1 Our Contributions 


In this paper, we propose Fair Auctions via Secret Transactions (FAST), in 
which there is no trusted auctioneer and where rational adversaries are always 
incentivized to complete protocol execution through a secret collateral deposit. 
The proposed protocol is such that each party can make sure the winning bid 
is the actual bid submitted by the winning party, and malicious parties can be 
identified, financially punished and removed from the execution (guaranteeing a 
winner is always determined). Our contributions are summarized as follows: 


— We propose using secret collateral deposits dependent on private bids inputs 
to ensure that the optimal strategy is for parties to complete the protocol. 

— (Sect.3) We propose methods for implementing a financial punishment mech- 
anism based on secret deposits and standard public smart contracts, which 
can be used to ensure the fair execution of our protocols. 
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— (Sects. 4) We propose cheater identifiable and publicly verifiable sealed bid 
auction protocols compatible with our secret deposit approach and more effi- 
cient than the state-of-the-art [3]. Our protocols are guaranteed to terminate, 
finding the winner, and paying the seller even if cheating occurs. 


To achieve fairness in an auction setting, we require each party to provide a 
secret deposit of an amount of cryptocurrency equal to the party’s private bid 
plus the cost of executing the protocol. In case a party is found to be cheating, a 
smart contract automatically redistributes cheaters’ deposits among the honest 
parties, the cheater is eliminated and the remaining parties re-execute the proto- 
col using their initial bids/deposits. Having a bid-dependent deposit guarantees 
that it is always more profitable to execute the protocol honestly than to cheat 
(as analyzed in Sect. 5). 

However, previous works that considered the use of cryptocurrency deposits 
for achieving fairness (e.g. [2,6,8,9,18,29]) crucially rely on deposits being pub- 
lic, thus using the same approach would reveal information about the bid. To 
overcome this, we propose using secret deposits that keep the value of the deposit 
secret until cheating is detected. Moreover, this ensures that the parties have suf- 
ficient funds to bid for the object (e.g., in a second-price auction, a party could 
bid very high just to figure out what is the second-highest price is and then claim 
her submitted bid was just a mistake). Our protocols are publicly verifiable, i.e. 
it is possible to prove to the smart contract (and to any third party verifier) that 
a party has cheated. 

In relation to previous works (discussed in Sect. 1.3), we emphasize that: 


— While using deposits to achieve fairness represents a well-known technique, 
previous works considered public deposits only. 

— Public deposits are not suitable for applications such as sealed-bid auctions 
since in order to achieve fairness, bid-dependent deposits are required, and 
public deposits would reveal information about the bid. For this reason, we 
introduce secret deposits, which represent a novel technique. 

— From a sealed bid auction perspective, our protocol improves the state-of- 
the-art both in terms of efficiency and security guarantees, i.e., it achieves 
fairness (while in previous works the adversary may learn the outcome of the 
auction before honest parties and abort without suffering any consequences). 

— No previous work in this setting considers adaptive adversaries since it would 
drastically increase the complexity of the protocol. For this reason, we focus 
on the static adversary case only. 


1.2 Our Techniques 


We start with a first-price sealed-bid auction protocol that builds on a simple 
passively secure protocol similar to that of SEAL [3] and compile it to achieve 
active security. However, we not only obtain an actively secure protocol but 
also add cheater identification and public verifiability properties. We use these 
properties to add our financial punishment mechanism with secret deposits to 
this protocol. Even though our protocol achieves stronger security guarantees 
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than SEAL (i.e., sequential composability and fairness guarantees), it is more 
efficient than the SEAL protocol as shown in Sect. 6. 


A Toy Example: Our protocol uses a modified version of the Anonymous Veto 
Protocol from [25] as a building block. The anonymous veto protocol allows a set 
of n parties P),..., Pn to anonymously indicate whether they want to veto or not 
on a particular subject by essentially securely computing the logical-OR function 
of their inputs. In this protocol, each party P; has an input bit d; € {0,1} with 
0 indicating no veto and 1 indicating veto, and they wish to compute V}; di. 

As proposed in [3], this simple anonymous veto protocol can be used for 
auctions by having parties evaluate their bids bit-by-bit, starting from the most 
significant bit and proceeding to execute the veto protocol for each bit in the 
following way: 1. Until there is no veto, all parties only veto (input d; = 1 in the 
veto protocol) if and only if the current bit of their bid is 1; 2. After the first 
veto, a party only vetoes if the bit of her bid in the last time a veto happened 
was 1 and the current bit is also 1. In other words, in this toy protocol, parties 
stop vetoing once they realize that there is another party with a higher bid (i.e., 
there was a veto in a round when their own bit were 0) and the party with the 
highest bid continues vetoing according to her bid until the last bit. Therefore, 
the veto protocol output represents the highest bid. However, a malicious party 
can choose not to follow the protocol, altering the output. 


Achieving Active Security with Cheater Identification and Public Ver- 
ifiability: To achieve active security with cheater identification and public ver- 
ifiability, we depart from a simple passively secure protocol and compile it into 
an active secure protocol using NIZKs following an approach similar to that 
of [26,29]. This ensures that at every round of the protocol all parties’ inputs 
are computed according to the protocol rules, including previous rounds’ inputs 
and outputs. However, since the generic techniques from [26,29] yield highly 
inefficient protocols, we carefully construct tailor-made efficient non-interactive 
zero-knowledge proofs for our specific protocol, ensuring it to be efficient. 


Incentivizing Correct Behaviour with Secret Deposits: In order to create 
incentives for parties to behave honestly, a deposit based on their bids is required. 
However, a public deposit would leak information about the parties’ bids, which 
have to be kept secret. Hence, we do secret deposits as discussed below and keep 
the amount secret unless a party is identified as a cheater, in which case the 
cheater’s deposit is distributed among the honest parties. The cheater is then 
eliminated and the protocol is re-executed with the remaining parties using their 
initial bids/deposits so that a winner is determined. This makes it rational not 
to cheat both in the case of first and second-price auctions, i.e., cheating always 
implies a lower utility than behaving honestly (see Sect. 5). 


Achieving On-Chain Efficiency: In order to minimize the amount of on-chain 
communication, an approach based on techniques from [5] is adopted. Every time 
a message is sent from a given party to the other parties, all of them sign the 
message received and send the signature to each other. Communication is only 
done on-chain (through the smart contract) in case of suspected cheating. 
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Secret Deposits to Public Smart Contracts: Since we use secret deposits 
based on confidential transactions [35], we need a mechanism to reveal the value 
of cheating parties’ deposits to the smart contract so it can punish cheaters. We 
do that by secret sharing trapdoor information used to reveal this value using a 
publicly verifiable secret sharing (PVSS) scheme [15] that allows us to prove in 
zero-knowledge both that the shares are valid and that they contain the trapdoor 
for a given deposit. These shares are held by a committee that does not act unless 
cheating is detected, in which case the committee members are reimbursed for 
reconstructing the trapdoor with funds from the cheater’s deposit itself. We 
discuss this approach in Sect. 3. Providing alternative methods for holding these 
deposits is an important open problem. 


1.3 Related Work 


Research on secure auctions started by the work of Nurmi and Salomaa [38] and 
Franklin and Reiter [23] in the late 1900s. However, in these first constructions, 
the auctioneers open all bids at the end of the protocol, which reveals the losing 
bids to all parties. Since then, many sealed bid auction protocols have been 
proposed to protect the privacy of the losing bids, e.g., [1,4,27,32,33]. However, 
in most of these protocols, privacy is obtained by distributing the computation 
of the final outcome to a group of auctioneers. 

A lot of work has been done to remove the role of the trusted parties, e.g., 
by Brandt [11]. In these protocols, the bidders must compute the winning bid in 
a joint effort through emulating the role of the auctioneer. Moreover, the seller 
plays a role in the auction and it is assumed that the seller has no incentive 
to collude with other bidding parties. However, later by Dreier et al. [22] it 
was pointed out that if the seller and a group of bidding parties collude with 
each other, then they can learn the bids of other parties. Besides weak security 
guarantees, the main drawback of the protocol proposed by Brandt [11] is that 
it has exponential computational and communication complexities. 

There have been implementations of auctions including [10], which have been 
deployed in practice for the annual sugar beets auction in Denmark. Other 
works [36] have considered the use of rational cryptography in enhancing privacy. 
Finally, the current state-of-the-art in protocols for secure First-Price Sealed- 
Bid Auctions was achieved in SEAL [3], which we compare with our protocols 
in detail in Sect. 6. To the best of our knowledge, none of these works considers 
incentives for the parties to complete the protocol or punishment for cheaters. 

An often desired feature of Secure Multiparty Computation (MPC) is that 
if a cheating party obtains the output, then all the honest parties should do so 
as well. Protocols that guarantee this are also called fair and are known to be 
impossible to achieve with dishonest majorities [16]. Recently, Andrychowicz et 
al. [2] (and independently Bentov & Kumaresan [8]) initiated a line of research 
that aims at incentivizing fairness in MPC by imposing cryptocurrency-based 
financial penalties on misbehaving parties. A line of work [9,18] culminating 
in [6] improved the performance of this approach with respect to the amount of 
on-chain storage and size of the collateral deposits from each party, while others 
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obtained stronger notions of fairness [29]. However, all of these works focus on 
using public collateral deposits for incentivizing fairness, which is not possible for 
our application. Moreover, they rely on general-purpose MPC, while we provide a 
highly optimized specific purpose protocol for auctions with financial incentives. 
The protocols of [21,24] are also based on cryptocurrencies. The work of [24] 
is the closest to ours as it leverages a cryptocurrency to ensure fairness, but it 
relies on SGX trusted execution enclaves. 


2 Preliminaries 


Let y Š F(x) denote running the randomized algorithm F with input x and 
implicit randomness, obtaining the output y. When the randomness r is specified, 
we use y — F(x;r). For a set X, let x È X denote x chosen uniformly at random 
from ¥; and for a distribution Y, let y È Y denote y sampled according to the 
distribution VY. We denote concatenation of two values x and y by z|y. We denote 
negligible functions as negl(a). We denote two computationally indistinguishable 
ensembles X = {Xx,z}xen,zefo,1}* and Y = {Yk 2 }keN,ze{0,1}* Of binary random 
variables by X %, Y. For a field F we denote by F[X]<m the vector space of 
polynomials in F[X] of degree at most m. 


2.1 Security Model and Setup Assumptions 


We prove our protocol secure in the real/ideal simulation paradigm with sequen- 
tial composition. This paradigm is commonly used to analyse cryptographic 
protocol security and provides strong security guarantees, namely that several 
instances of the protocol can be executed in sequence while preserving their 
security. To prove security, a real world and an ideal world are defined and com- 
pared. In the real world, the protocol m is executed with the parties, some of 
which are corrupted and controlled by the adversary A. In the ideal world, the 
protocol is replaced by an ideal functionality F and a simulator S interacts with 
it. The ideal functionality F describes the behaviour that is expected from the 
protocol and acts as a trusted entity. A protocol z is said to securely realize the 
ideal functionality F, if for every polynomial-time adversary A in the real world, 
there is a polynomial-time simulator S for the ideal world, such that the two 
worlds cannot be distinguished. In more detail, no probabilistic polynomial-time 
distinguisher D can have a non-negligible advantage in distinguishing the con- 
catenation of the output of the honest parties and of the adversary A in the real 
world from the concatenation of the output of the honest parties (which come 
directly from F) and of the simulator S in the ideal world. More details about 
this model are in [14]. Our protocol uses the Random Oracle Model (ROM). 
Note that adopting the UC model, as an alternative, requires to use UC-secure 
NIZK (instead of those described subsequently), but reduces the efficiency of 
the protocol. Also, previous works consider the sequential composability model 
only. 
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Adversarial Model: We consider malicious adversaries that may deviate from 
the protocol in any arbitrary way. Moreover, we consider the static case, where 
the adversary is only allowed to corrupt parties before protocol execution starts 
and parties remain corrupted (or not) throughout the execution. Moreover, we 
assume that parties have access to synchronous communication channels, i.e., 
all messages are delivered within a given round with a known maximum delay. 


Decisional Diffie Hellman (DDH) Assumption: The DDH problem consists 
in deciding whether c = ab or c © Zp in a tuple (g,g%,g°,g°) where g is a 
generator of a group G of order p, and a,b ra Zp. The DDH assumption states 
that the DDH problem is hard for every PPT distinguisher. It is well known 
that the DDH assumption implies the Discrete Logarithm assumption. 


2.2 Building Blocks 


Pedersen Commitments: Let p and q be large primes such that q divides p— 1 
and let G be the unique subgroup of Z% of order q. All the computations in G 
are operations modulo p, however we omit the mod p to simplify the notation. 
Let g,h denote random generators of G such that nobody knows the discrete 
logarithm of h base g, i.e., a value w such that g” = h. The Pedersen commit- 
ment scheme [40] to an s € Z, is obtained by sampling t = Zq and computing 
,(s,t) = g%h'. Hence, the commitment ,(s,t) is a value uniformly distributed 
in G and opening the commitment requires to reveal the values of s and t. The 
Pedersen commitments are additively homomorphic, i.e., starting from the com- 
mitment to sı E€ Zq and s2 € Z,, it is possible to compute a commitment to 
Sı + S2 E Zq» 4.€.5 , (s1,t1)-, (So, ta) =, (sı + s2, t1 + ta). 


Simplified UTXO Model: In order to focus on the novel aspects of our pro- 
tocol, we represent cryptocurrency transactions under a simplified version of the 
Bitcoin UTXO model [37]. For the sake of simplicity, we only consider operations 
of the “Pay to Public Key” (P2PK) output type, which we later show how to 
realize while keeping the values of transactions private. The formal description 
of the adopted simplified UTXO model is discussed in the full version [20]. 


Confidential Transactions: In the case of confidential transactions [35] the 
input and output amounts are kept secret using Pedersen commitments. How- 
ever, in order to achieve public verifiability, the transactions contain a zero- 
knowledge proof that the sum of the inputs is equal to the sum of the outputs, 
and that all the outputs are between [0,2} — 1] (which can be computed with 
Bullet Proofs [12]). Note that the input set In in confidential transactions can 
also be public, (i.e. In = {(idi,inj),...,(idm,inm)}), as long as the outputs 
are kept private. In particular, confidential transactions can be formally defined 
by modifying the simplified UTXO model described above as follows: 


— Representing inputs and outputs: Set In is defined as In = {(idi, 
com(in,,Tin,)),---;(idm,com(in,,, Tin,,))} and set Out is defined as Out = 
{(com(outy, rout, ), Addri),...,(com(outny, rout, ), Addr,) }. 
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II} com(out; Tout j ) 


[[72, com(in; Tinj) 


— Generate Transaction with In, Out: Compute = com 


(0, Dya Tout; — SL] fin) with Tin; Tout; È Zq, include in the transaction 
the randomness Di Tout; — J ici fin; and the range proofs 7 guaranteeing 
that out,,--- , out, are between [0, 2’ — 1]. The resulting transaction is then 
represented by tx = (id, In, Out, Sig, ee Tout; — Dope Ping T)- 

T3- com(out; Tout ; J 

BE com(ini,fin; ) 
check if the obtained commitments is equal to com(0, Da-i Tout; yi fini), 
guaranteeing that )`;-; in; = 05, out,, then check the validity of the range 
proofs r. 

— Spend a transaction output Out: Parse Out = (com(out;, rout, ), Addr;). 
To spend Out, the commitment com(out,, rout; ) = gh’ has to be opened 
by revealing out; and rout,. Values out; and rout, are included in a regular 
UTXO transaction and they are described in the full version [20]. Later on, 
this UTXO transaction can be validated by checking that outi, fout; is a 
valid opening of com(out;, rout, ) and following the steps of a regular UTXO 
transaction validation. 

— Spend a transaction output Out with a NIZKPoK of row,;: Alterna- 
tively, an output Out = (com(out;, rour;), Addr;) for which only out; and 
h = hi (but not royt,) are known can be spent if a NIZK 7’ proving 
knowledge of rout; is also available. Notice that knowing out, is sufficient 
for validating the regular UTXO transaction created using Out as an input. 
Moreover, it can be checked that gh": = com(out,, rout, ) given out; and 
h = h’i, while the proof 7 guarantees that h = hi is well formed.! Val- 
ues out;, h™™: and the proof 7’ are included in a regular UTXO transaction 
generated and they are described in the full version [20]. Later on, this UTXO 
transaction can be validated by checking that gh’: = com/(out;, rout, ), 
checking that 7’ is valid and following the steps of a regular UTXO transac- 
tion validation. 


— Validate a Transaction tx: Compute = com(s,t) and 


Publicly Verifiable Secret Sharing (PVSS): In our work, we use the PVSS 
protocol mpyss from [15]. A PVSS protocol allows for a dealer to distribute 
encrypted shares to a set of parties in such a way that only one specific party 
can decrypt a share but any third party verifier can check that all shares are 
valid. Later on, each party can decrypt its corresponding share to allow for 
reconstruction while showing to any third-party verifier that the decrypted share 
corresponds to one of the initial encrypted shares. A deposit committee C = 
{Ci,...,Cm} will execute this protocol verifying and decrypting shares provided 
as part of our secret deposit mechanism (further discussed in Sect. 3). Since the 
parties in C executing mpysg must have public keys registered as part of a setup 
phase, we capture this requirement in Fsc as presented in Sect. 2.2. 


1 In fact, showing such a proof of knowledge n’ of Tout; together with h’™*: and out; 
makes it easy to adapt reduction of the binding property of the Pedersen commitment 
scheme to the Discrete Logarithm assumption. Instead of obtaining rout; from the 
adversary, the reduction simply extracts it from 7’. 
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NIZK for PVSS Share Consistency CC: As part of our secret deposit 
mechanism (further discussed in Sect. 3), we will use a NIZK showing that shares 
computed with the PVSS protocol mpysg from [15] encode secrets g™ and h” 
that are terms of a Pedersen commitment c = g™h". Formally, given generators 
g1, ---; 9n, 9, h of a cyclic group Gg of prime order q, pairwise distinct elements 
Q1,--.,Q@n in Z and a Pedersen commitment c = gh" known by prover and 
verifier, for p(x) and m,r known by the prover, this NIZK is used to prove 


that (G1,..-,Gn) E€ (ge ae oe) : p € Zq[X], p(—1) = g”, p(—2) = nv}. 
We denote this NIZK by CC((gi)ie fn, (Qi)icin] 9, h, c, (Fi)iefnj). Notice that this 
NIZK can be constructed using the techniques from [13] and integrated with the 
NIZK LDEI (Low-Degree Exponent Interpolation) defined in mpygg [15]. 


Modelling a Stateful Smart Contract: We employ a stateful smart contract 
functionality Fsc similar to that of [18] in order to model the smart contract 
that implements the financial punishment mechanism for our protocol. For the 
sake of simplicity, we assume that each instance of Fsc is already parameterized 
by the address of the auctioneer party who will receive the payment for the 
auctioned good, as well as by the identities (and public keys) of the parties in 
a secret deposit committee C that will help the smart contract to open secret 
deposits given by parties in case cheating is detected. We also assume that Fsc 
has a protocol verification mechanism pv for verifying the validity of protocol 
messages. For description of the Fsc see the full version [20]. 


3 Secret Deposits in Public Smart Contracts 


When using secret deposits as in our application, it is implied that there exists 
a secret trapdoor that can be used to reveal the value of such deposits (and 
transfer them). However, since we base our financial punishment mechanism on 
a standard public smart contract, we cannot expose the trapdoor to the smart 
contract. Instead, we propose that a committee C = {Cy,...,Cm} with m/2 + 2 
honest members? holds this trapdoor in a secret shared form. This committee 
does not act unless a cheating party needs to be punished and the trapdoor 
needs to be reconstructed to allow the smart contract to transfer her collateral 
deposit. In this case, the committee can be reimbursed from the collateral funds. 
We present a practical construction following this approach. Proposing methods 
for keeping custody of such secret deposits is left as an important open problem. 


A Possible Solution: A feasible but not practical approach to do this would be 
storing the trapdoor with the mechanism proposed in [7], where a secret is kept 
by obliviously and randomly chosen committees by means of a proactive secret 
sharing scheme where each current committee “encrypts the secret to the future” 
in such a way that the next committee can open it. However, it is also necessary 


2 We need m/2+2 honest members to instantiate our packed publicly verifiable secret 
sharing based solution where two group elements are secret shared with a single 
share vector. 
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to ensure that the secrets actually correspond to the trapdoor for the parties’ 
deposits. Providing such proofs with the scheme of [7] would require expensive 
generic zero-knowledge techniques (or a trusted setup for a zk-SNARK). 


Protocol He 


Let C = {C1,...,Cm} be the deposit committee members and pkc,,...,pkc,, and 
skce,,...,8kc,, be their public keys and private keys, used to run mpy ss, respectively. 
Moreover, let pke,,...,pkc,, and skc,,...,8k¢,, be their public keys and private 
keys, used for signatures, respectively. The following steps are executed by C; € C: 


— Setup verification: Upon receiving (SETUP, sid, Pi, txi, pki, (Gi1,...,im); 
LDEI,,CC;) from P;, Cj checks that tx; is valid, verifies the shares (Gi1,..., Fim) 
correctness with respect to the committee public keys pkc,,...,pkc,, us- 


ing the verification procedure of mpyss through LDEI; and verifies 
NIZK CC;. If all the checks pass, compute the hashes SH1; = H(txi, pki) 
and SH2;=H((Gi1,...,¢im), LDEIi,CCi) and the signature Sige, = 
Sisri, (SH1;|SH2;), then send (SETUP-VERIFICATION, sid, Sigc,,i) to Pi. 

— Share decryption: Upon receiving (OPEN, sid, Pi) from Fsc, C; uses the share 
decryption procedure from mpygss on 6;;, obtaining o;;, DLEQ;;. and sending 
(SHARE-DECRYPTION, sid, (Gi1,..., im), LDE, CCi, Fij, DLEQi;)) to Fsc. 


Fig. 1. Protocol He 


A Protocol Based on PVSS: As an alternative, we propose leveraging the 
structure of our confidential transaction based deposits to secret share their 
openings with a recent efficient publicly verifiable secret sharing (PVSS) scheme 
called Albatross [15]. Notice that the secret amount information b; in these 
deposits is represented as a Pedersen commitment g'h" and that the Albatross 
PVSS scheme also allows for sharing a group element gê, while proving in zero- 
knowledge discrete logarithm relations involving g* in such a way that they can 
be verified by any third party with access to the public encrypted share. Hence, 
we propose limiting the bid b; bit length in such a way that we can employ 
the same trick as in lifted ElGamal and have each party P; share both g® and 
h™ with the Albatross PVSS while proving that their public encrypted shares 
correspond to a secret deposit g’*h™. The validity of this claim can be verified 
by the committee C itself or the smart contract during Stage 1 - Setup. Later 
on, if b; needs to be recovered, C can reconstruct g™, brute force b; (because it 
has a restricted bit-length) and deliver it to the smart contract while proving 
it has been correctly computed from the encrypted shares. As we explain in 
Sect. 2, recovering b; and g™ along with the proofs of share validity is sufficient 
for transferring the secret deposit. 

In Fig. 1, we present Protocol He followed by the committee C = {C1,...,Cm} 
and executed as part of Protocols M ppa described in Sect. 4. The interaction of 
the other parties P = {P,...,P,»} executing Protocols Hppa and gpa with 
the committee C is described as part of Stage 1 - Setup of these protocols. 
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Selecting Committees: In order to focus on the novel aspects of our construc- 
tions, we assume that the smart contract captured by Fsc described in the full 
version [20] is parameterized by a description of the committee C = {C1,...,Cm} 
and the public keys corresponding to each committee member. Notice that in 
practice this committee can be selected by the smart contract from the set of 
parties executing the underlying blockchain consensus protocol. The problem of 
selecting committees in a permissionless blockchain scenario has been extensively 
addressed in both Proof-of-Stake e.g. [19,28] and Proof-of-Work [39] settings. 


4 First-Price Auctions 


In this section, we introduce our protocol for first-price auctions (while the case of 
second-price auctions is addressed in the full version [20]). We consider a setting 
with n parties P,,..., Pn, where each party P; has a l-bit bid b; = bj|...|bi, 
where bir denotes the r-th bit of party P;’s bid. 


Functionality Frpa 


Frpa operates with an auctioneer Pauc, a set of parties P = {P1,...,Pn} who 
have bids 61,...,6n as input and where bi = bii|...|bi is the bit representation of 
bi, as well as an adversary Srp. Frpa is parameterized by a bid bit-length l and 
keeps an initially empty list bids. 


— Setup (Bid Registration): Upon receiving (BID, sid, coins(b; + work)) from 
P; where b; € {0, 1}! and work is the amount required to compensate the cost 
of running the protocol for all the other parties, Frpa appends b; to bids. 

— First-Price Auction: After receiving (BID, sid, coins(b;+work)) from all parties 
in P, for r = 1,...,l Frpa proceeds as follows: 

1. Select bur, i.e., the r-th bit of the highest bid b,, in the list bids. 

2. Send (ROUND-WINNER, sid, bwr) to all parties and Srpa. 

3. Check if bu, = 1 and bir = 0 for i =1,...,n 4 w. If so, let ry» = r, that is 
the first position where bw has a bit 1 and the second highest bid bw, has a 
bit 0, and send (LEAK-TO-WINNER, sid, Tw) to Pw. 

4. Send (ABORT?, sid) to Srpa. If Srpa answers with (ABORT, sid, Pi) where 
Pi is corrupted, remove b; from bids, remove P; from P, send (ABORT, 

sid, Pi, coins( vie )) where |P| is the number of remaining parties to all 

other parties in P, set again r = 1 and go to Step 1. If Srpa answers with 
(PROCEED, sid), if r = l go to Payout, else increment r by 1 and go to Step 1. 

— Payout: Send (REFUND, sid, Pw, coins(b; + work)) to all parties P; # Pw, send 

(REFUND, sid, coins(work)) to Pw, send coins(bw) to Pauc, and halt. 


Fig. 2. Functionality Frpa 


Modelling Fair Auctions: First, we introduce an ideal model for fair auctions 
that we will use to prove the security of our protocol. For the sake of simplicity, 
when discussing this model, we use coins(n) to indicate n currency tokens being 
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transferred where n is represented in binary, instead of describing a full UTXO 
transaction. Our ideal functionality Fppa is described in Fig. 2. This function- 
ality models the fact that the adversary may choose to abort but all it may learn 
is that it was the winner and the most significant bit where its bid differs from 
the second-highest bid. Regardless of adversarial actions, an auction result is 
always obtained and the auctioneer (i.e., the party selling the asset) is always 
paid. The second-price case is presented in the full version [20]. 


The Protocol: In Figs. 3, 4, 5 and 6, we construct a Protocol I ppa that realizes 
Frpa. This protocol is executed by n parties P1,..., Pn, where each party P; 
has a l-bit bid b; = b;1|... |b and a deposit committee C = {Ci,...,Cm} that 
helps open secret deposits from corrupted parties in the Recovery Stage. The 
protocol consists of 4 main stages plus a recovery stage, which is only executed 
in case of suspected (or detected) cheating. In the first stage, every party i sends 
to the smart contract a secret deposit, whose structure will be explained in detail 
later. In the second and third stages, all parties jointly compute the maximum 
bid (bit-by-bit) by using an anonymous veto protocol that computes a logical 
OR on private inputs. To this aim, the parties start from the most significant bit 
position. Then, they apply the anonymous veto protocol according to their bits 
bir, with 0 representing a no veto and 1 representing a veto. If the outcome of the 
veto protocol (i.e., the logical-OR of the the inputs) is 1, then each party P; with 
input bir = 0 figures out that there is at least another party Pk whose bid bz is 
higher than b; and P; discovers that she cannot win the auction. Therefore, from 
this point on, P; stops vetoing, disregarding her actual bit bir in the next rounds. 
Otherwise, P; is expected to keep vetoing or not according to her bit bir. Finally, 
in Stage 4 the winning party Pu executes the payment to the auctioneer (i.e., the 
party selling the asset). Throughout all stages, the parties must provide proofs 
that they have correctly computed all protocol messages. If a party is identified 
as dishonest at any point, the Recovery Stage has to be executed. 


Security Analysis: It is clear that this protocol correctly computes the highest 
bid. The ideal smart contract enforces payment once a winner is determined and 
punishments otherwise. The security of this protocol is formally stated in the 
following theorem. A game-theoretical analysis is presented in Sect.5, where it 
is shown that the best strategy for any rational party is to follow the protocol. 


Theorem 1. Under the DDH Assumption, Protocol Ippa securely computes 
Frpa in the Fsc-hybrid, random oracle model against a malicious static adver- 
sary A corrupting all but one parties P; © P and m/2— 2 parties C; € C. 


Due to space limits, we leave the full proof to the full version [20]. 


5 Rational Strategies 


In this section, we consider the incentives of parties in our protocols. Note that 
the set of bidders is fixed through the execution, i.e., once the execution has 
started, even if it is required to re-execute the protocol, no new bid can be 


FAST: Fair Auctions via Secret Transactions 739 


Protocol Irpa (Off-chain messages exchange) 


Protocol Trp is executed with n parties P = {Pi,...,Pn}, where each party P; 
has a l-bit bid b; = bi|...|b;. and a deposit committee C = {Ci,...,Cm}. Parties 
P,C interact among themselves and with a smart contract Fsc. 


Off-chain messages exchange: To minimize the communication with the smart 
contract, an approach based on [5] is adopted. Let r be a generic round of the 
protocol, then each party P; actually proceeds as follows when sending her messages: 


— Round,: each P; sends msgr,i, sigsk;(msgr,i) to all the other parties; 

— Round,+1: all the other parties P for k € {1,...,n}\ 7% sign the message 
received from party i and send msq,,i, Sigsk,(Msgr,i) to all the other parties, 
allowing them to check if party 7 sent no conflicting messages. Then, each party 
repeats from the instructions described in the previous round; 

— Conflicting messages: in case P; sends conflicting messages Msgr, £ MSG}. ; 
to parties Pk A Px, Pk or Px send to the smart contract msgr,i, Sigsk;(MSgr,i) 
and msg. ;, Sigsk; (Msg, ;) as a proof that P; was dishonest; 

— Evidence of a message: in case it has to be proven that a message msgr, has 
been sent by party P; in round r, the other parties send to the smart contract 
the signatures sigsk,(MSGr,i),-+-, Sigskn (MSgr,i) along with the message mMsgr,i. 


Fig. 3. Protocol IJ7p4 (Off-chain messages exchange) 


submitted, and it is therefore not possible to gain from the leaked information. 
Moreover, in case there is a cheating party, the protocols refund the honest 
parties with her deposit. 

We now consider the utility of each party from participating in the pro- 
tocol. The utility function of a generic party P; in the first-price auction is 
uF PA(by,...,bn) = vi— bi if bi > max;y; bj and 0 otherwise, while in the second- 
price auction is instead uSPA (by, ...,bn) = vi — max;y; bj if bj > maxjzi bj and 
0 otherwise, where v; represents the P;’s private valuation of what is at stake in 
the auction. It is known that in the first-price auctions the optimal strategy for 
each rational party depends on their beliefs regarding other party’s valuations, 
while in the second-price auction the optimal strategy for each party is to bid an 
amount equal to her valuation regardless of the strategy of other parties [31,34], 
i.e., bi = Vi. 

In case a party P; is honest, she always gets her deposit work back. Then, 
if she is the winner, she gets what is at stake in the auction and pays b;, while 
if she is not the winner, she gets her entire deposit b; + work back. Therefore, 
by following the protocol each rational party has a non-negative utility, i.e., 
ui(b1,...,0n) > 0. However, if a party cheats her deposit b; + work is distributed 
among honest parties. Therefore, the utility of a cheating party, regardless of 
whether her bid is the highest or not, is u;(b1,...,b,) = —(b; + work) < 0, 
which is strictly negative. Therefore, cheating is a dominated strategy for each 
party, i.e., regardless of what other players do it always results in a lower utility. 
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Protocol Irpa (Stage 1) 


Stage 1 - Setup: Deposit committee parties C, € C first execute the Setup Verifi- 
cation step of Ie from Figure 1. All parties P; proceed as follows: 


1. P; sends a secret deposit containing their bid b;, change change and a fee work 
to the smart contract through a confidential transaction (as described in Sec- 
tion 2.2). Let Addr; be the address associated to party i and Addr; be the 
address associated to the smart contract, P; proceeds as follows: 

(a) Pi sends (PARAM, sid) to Fsc, receiving (PARAM, sid, g, h, pkc,,...,pkem)- 

(b) P; computes the bit commitments as cj, = ght hir, with rir yi Zq, to each 
bit bir of bi, and the bid commitment as ci = I- a” = gi hur=1 2T rin, 
Let ra, be equal to Eaa grr... Then, c; can be rewritten as ci = g” hti = 
com(b;, ra, )- 

(c) Define sets In = {(id;,in;)} and Out = {(ci, Addrs), (work, Addrs), 
(com(change;, Tchange;), Addri)}, where ci = com(b;, rp, ) is the commitment 
to the bid b; previously computed at Step 1, work is the amount required 
to compensate the cost of running the protocol for all the other parties in 
P and in C, change = in; — bi — work and fehange & Zą. Note that, in this 
case case, in; and work are public, while b; and change are private. 

(d) Compute rout = Yb, + Tchange;, SO as to allow the other parties later to 
verify that the sum of the inputs is equal to the sum of the outputs, i.e. 
c+ com(change, Tehange) 2 com(in; — work, rout). 

(e) Compute proofs (7»,,Tchange) Showing that bi, change € [0, pt 1], set txi = 
(ad, In, Out, Sig, re; + rehange;, T). 

(£) Compute the shares (6i1,...,@im,LDEI:) of g* and hèi using the distri- 
bution procedure from mpy gs with pkc,,...,pkc,, received in step (a). 

(g) Compute CC; — CC((pke;)je[mj; (J) jem) 9, R, Ci, (Fj) je[m]) to prove con- 
sistency among the shares (Gi1,...,@im) and the commitment terms g” and 
hi from ci = g” hèi. 

(h) Send (SETUP, sid, Pi, txi, pki, (Gi1,..., Fim), LDE, CC;) to each Cj € C. 

(i) Upon receiving (SETUP-VERIFICATION, sid, Sigc,,:) from all C; € C, com- 
pute SH1; = H(txi, pki) and SH2; = H((Gi1,...,Gim), LDEL, CC) and 
send (SETUP, sid, Pi, txi, pki, SH1i, SH2i, Sige, i,.--,Stgem,i)) to Fsc. If a 
party Ca € C does not send this message, proceed to the Recovery Stage. 


2. Pi samples vir & Zq and computes Xir = g**™ for r = 1,...,l, sending 
Cil, + , Cit; Xil, +e , Xa to all other parties. 

3. Upon receiving all messages cj1,--- ,¢j1, Xj1,°-: , Xj, from other parties P;, Pi 
computes Yj~ = Le; Xmk/ ere Xm forj = 1,...,n,k = 1,...,1, and 


verifies for each other party P; that cj = Ii- a for j € {1,...,n}\i@. If this 
verification fails or a message is not received, proceed to the Recovery Stage. 


Fig. 4. Protocol Irpa (Stage 1) 


The above analysis shows that it is not rational for an adversary A controlling 
a single party to deviate from the protocol. Next, we show that it is also the case 
for an adversary A controlling more than one party. Let P;, P; be two parties 
controlled by A and let v4 be the valuation of the adversary for what is at stake 
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Protocol Irpa (Stages 2 and 3) 


Stage 2 - Before First Veto: All parties P;, starting from the most significant bit 
bi: and moving bit-by-bit to the least significant bit bj of their bid b; = bi|... |bi, 
run in each round r the anonymous veto protocol until the outcome is a veto (t.e., 
V, #1) for the first time. Therefore each party P; proceeds as follows: 


1. Compute vir as follows: if bir = 0 then vir = Ye: if bir = 1 then vir = gg" where 
Fir È Zą. Then generate NIZK proving that vir has been correctly computed 
BV ir — BV {bir, Tir, Zir, Pir | ( cir = Cir = h™r AN Vir = Yee A Xir = g™*) V 


gir 


Era = za =h"" N Vir = g"'")}, sending a message (Vir, BVir) to all parties. 
2. Upon receiving all messages (Ukr, BVkr) from other parties Pk, Pi checks the 
proofs BVkr for k € {1,...,n}\é and, if all checks pass, computes V, = Į [p4 ver 
and then goes to Stage 3 if V, # 1 (at least one veto), otherwise follows the steps 
in Stage 2 again until the round r = l. Note that, unless all the bids are equal 
to 0, at some point the condition V, Æ 1 is satisfied. If a message is not received 


from party Px or if BVķr is invalid, proceed to the Recovery Stage. 


Stage 3 - After First Veto: Let 7 denote the last round at which there was a veto 
(i.e., Ve Æ 1). All parties P;, starting from b;p+ı and moving bit-by-bit to the least 
significant bit bi of their bid bi = bi1|...|bit, run in each round r > F the anonymous 
veto protocol taking into account both the input bit b;, and the declared input bit 
dir, defined as the value that satisfies the logical condition (bir = OA dir = 0) V (bir 
lAdiz = 1Adir = 1)V (bir = 1Adiz = OAdir = 0), i.e., each party P; vetoes at round 
r iff she also vetoed at round F (i.e., dip = 1), and her current input bit bir = 1. 
Therefore, each P; proceeds as follows: 


1. Compute vir as follows: if bir = 0, then vir = Yon if dip = 1A bir = 1, then 
Vir = gi" where Tir = Zq; if dip = 0 A bir = 1, then vir = Yo Then generate 
NIZK proving that vir has been correctly computed 
AVir — AV{bir, Tir, Lir, Tim Tir, Lie | (GH = cir = RA Vir = Ye" A Xir = 


Gip 

g +r 
Pip Ci Cir Pis he. Tie : Fi Cir Ci Fiy eae 
g me = 7 =h A diz =g E A Vir =g ir) v (a = oe =h"r A diz = 


Yor A Xip = 9? A Vir = Yi” A Xir = g7 )}, sending a message (vir, AVir) to 
all parties. 

2. Upon receiving all messages (Ukr, AV) from other parties P, Pi checks the 
proofs AV, for k € {1,...,n}\¢ and, if all checks pass, computes V, = [[;_, ver; 
following the steps in Stage 3 again until round r = l. If a message is not received 
from party Pp or if AVkr is invalid, proceed to the Recovery Stage. 


Fig. 5. Protocol IIrpa (Stages 2 and 3) 


in the auction. Without loss of generality let b; > bj. If A does not deviate from 
the protocol, then her utility is either 0 (in case neither b; nor b; is the winning 
bid) or va — b; (in case b; is the winning bid). Instead, if A deviates from the 
protocol by making P; dropout, in case b; is not the second-highest bid, then 
her utility is —(work + b;). If b; is the second-highest bid, A gets what is at 
stake in the auction but her utility is v4 — (b; + work + bj). Therefore A always 
prefers to behave honestly. 
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Protocol Irpa (Stages 4 and Recovery) 


Stage 4 - Output: At this point, each party P; knows the value of V, for each 
round r = 1,--- ,l and the protocol proceeds as follows: 

1. Pi computes the winning bid as by = bwil--+|bwz, such that bwr = 1 if V, Æ 1 
and bwr = 0 if V- = 1, and sends bw to all other parties (causing all parties Px 
to sign by and send sigsk,(bw) to each other). We denote by Pw the winning 
party (i.e. the party whose bid is bw). 

2. Pw opens the commitment to her bid com(bw, re„) towards the smart contract 
by sending (OUTPUT, sid, Pw, bw, Tow, {8iGsk;, (bw) }kefn]) to Fsc. 

3. If Pw does not open her commitment or if multiple parties open their commit- 
ments, P; proceeds to the Recovery Stage. 

4. Finally, all parties who honestly completed the execution of the protocol receive 
a refund of their deposit from the smart contract, apart from the winning party, 
who only receives a refund equivalent to the work funds. 


Recovery Stage: Parties C; € C listen to Fsc and execute the Share Decryption 
step of Ie from Figure 1 if requested. In case a party P; € P is suspected of cheating, 
the Recovery stage is executed as follows to identify the cheater depending on the 
exact suspected cheating: 


— Missing message or signatures: a message Msgri or a signature 
Sigsk, (MSg,—1,i”), ON a Message msgr—1, by Pi, expected to be sent in round r by 
P; is not received by Pk. Then, Pk sends to Fsc the message (RECOVERY-MISSING, 
sid, msg, {sigsk, (MSg)}kejn]), Where msg is the last message signed by all parties 
and waits for Fsc to request the missing message. In that way, P; is expected to 
send msg,i or 8igsk,; (MSG,—1,i') to Fsc. If no action is taken, P; is identified as a 
cheater. 

— Conflicting messages or Invalid message: In round r, Pi sends conflicting 
messages MS8Gri, Sigsk,(Msgri) and MSgri, Sigsk,(msg,;) to different parties Pg and 
P;,. In this case, P and P;, set the conflicting messages as a proof of cheating 
Te = (MSGri, Sisk; (MSGri), MSGri, Sigsk,(Msg;;)). Otherwise, P; sends an invalid 
message MSgri, Sigsk;(MSgri) to Px (i.e. the message does not follow the structure 
described in the protocol for messages in round r), P uses this message as a proof 
of cheating me = (MSgri, Sigsk; (MSgri)). Pk sends (RECOVERY-CHEAT, sid, Pi, Te) 
to the smart contract and P; is identified as a cheater. 

Every party P; identified as a cheater loses her whole deposit (b; + work), which is 
distributed to the other parties by Fsc, and the protocol continues as follows: 


— Re-execution (unknown bw): in case bu has not been computed, the protocol is 
re-executed from Stage 2 excluding the parties identified as cheaters. 

— Complete payment (known bwu but unknown Pw): in case bw has been 
computed but Pw does not send (OUTPUT, sid, Pw, bw, Tbu, {Sigsk, (bw)}keln]) to 
Fsc, all Pi € P compute a NIZK NW; — NW{ai,...,cu | (Vi = LA va = 
Yg”) V... V (Vi = 1A va = Yj )} showing that they are not the winner. Then 
they send to Fsc (RECOVERY-PAYMENT, sid, NW;). The winner Pae (in case it is 
identified) or all parties P; who do not act (in case Pw is not identified) are iden- 
tified as dishonest and lose their deposits, which are distributed among the honest 
parties. 


Fig. 6. Protocol Irpa (Stages 4 and Recovery) 
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Note that it is necessary to have the deposit amount at least equal to the bid. 
Indeed, let d be any deposit amount smaller than b;. Then the utility of A by 
making P; drop out the protocol is v4 — (d + work + bj), while it is v4 — b; by 
behaving honestly. Therefore, in case d+ work +b; < bi, A prefers to deviate 
from the protocol to increase her utility. A similar argument shows that in the 
second-price auction, A always prefers to act honestly. 


6 Complexity Analysis and Comparison to Other 
Protocols 


In this section, we present concrete estimates for the computational and commu- 
nication complexity of our first and second-price auction protocols, i.e., HPPA 
and Ispa, respectively. We show that, in the first-price case, pp, is more 
efficient than the state-of-the-art protocol SEAL [3]. In the second-price case, 
we show that Ispa only incurs a small overhead (dominated by re-executing 
one round) over Irpa. Note that the complexity of Stages 2 and 3 is based on 
the NIZK constructions available in the full version [20]. 


Table 1. First-price auction computational complexity comparison in terms of expo- 
nentiations performed by a party P; € P: n is the number of parties, l is the total 
number of rounds in Stages 2 and 3 (i.e., bit-length of bids), 7 is the number of rounds 
in Stage 2. 


Stage 1 Stage 2 Stage 3 Total 
FAST nl+l+8logl+2 7(8+10n) (1—7)(19+22n) 23nl + 201 + 8 logl— 
llr —12n7 +2 
SEAL [3] 111 + 12nl 7(17+20n) (l—7)(33+36n) 48nl + 441 — 167— 
16nr 


Table 2. First-price auction communication complexity comparison in terms of trans- 
mitted bits by a party P; € P: n is the number of parties, l is the total number of 
rounds in Stages 2 and 3 (i.e., the bit-length of bids), 7 is the number of rounds of 
Stage 2, |G| and |Z,| indicate the bit-length of elements g € G and z € Z, respectively, 
k id the security parameter, as defined in Sect. 2. 


Stage 1 Stage 2 Stage 3 Total 
FAST n((21+ 10)|G|+ nT(|G| + 6|Z,|) n(l—7)(|G|+ n(|G|(3l + 10)+ 
3k + 4log 1) 11|Z,|) |Zq|(11l — 57) + 36+ 
A log l) 
SEAL [3] 17nl|G| 23n7|G| 36n(l — T)|G|  (53nl — 13n7)|G| 


The First-Price Case: A concrete estimate of computational complexity is 
shown in Table1 and one for communication complexity is shown in Table 2. 
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We estimate these concrete complexities in terms of the number of exponen- 
tiations performed by a party P; and of the number of bits transmitted by a 
party P; in an execution of protocol Ippa, respectively. Moreover, we compare 
the complexity of our protocol with SEAL [3], which is the current state-of-the- 
art protocol for first-price sealed-bid auctions. In a similar way to our proto- 
col, SEAL requires all parties to jointly compute the maximum bid bit-by-bit 
and is subdivided into a Stage 1 devoted to the setup, a Stage 2 identifying 
the rounds of the protocol before the first veto and a Stage 3 identifying the 
rounds of the protocol after the first veto. Hence, we highlight the differences 
in terms of complexity stage by stage. Note that, in order to make the commu- 
nication complexities of the two protocols comparable, both of them have been 
expressed in terms of |G|. Finally, FAST has an additional Stage 4 guaranteeing 
that the payment from the winning party P,, to the auctioneer is executed. On 
the other hand, SEAL does not guarantee this property. In particular, Stage 4 
requires 1 exponentiation per party and has a communication complexity equal 
to 2(n — 1)|G]. 


The Second-Price Case: The computational and communication complexities 
of the proposed second-price auction are still linear in the number of agents. That 
is, assuming that at round r, there is a party who is the only one that is vetoing, 
then the parties have to re-run the rt” round with one less party. More precisely, 
by following the notation of Table 1 and 2, let 7 be the number of rounds in Stage 
2, then the computational complexity of Stage 1 and Stage 2 is similar to the 
first-price auction, that is nl+/+8 logl+2 for Stage 1, and 87 +10nz for Stage 2. 
Let r, be the number of rounds until there is only a single party who is veto-ing. 
Therefore the computational complexity of Stage 3 is 19r + 22nr until there is 
only a single veto. After this, the parties have to run the protocol with one less 
party, i.e., n — 1 parties. Depending on the bid structure of the remaining n — 1 
parties, the protocol is either in Stage 2 or Stage 3. Let 7’ denote the number 
of rounds until the remaining n — 1 parties get a veto. Then the computational 
complexity for these 7’ rounds would be 87’ + 10(n—1)r7’, and for the remaining 
l—(r+7'+r) it would be 19(1—(7 +7/+1r)) +22(n—1)(I— (7 +7/ +r)). Using 
the same notation, a similar argument follows for the communication complexity 
per party in the case of the second-price auction. 


References 


1. Abe, M., Suzuki, K.: M + 1-st price auction using homomorphic encryption. In: 
Naccache, D., Paillier, P. (eds.) PKC 2002. LNCS, vol. 2274, pp. 115-124. Springer, 
Heidelberg (2002). https://doi.org/10.1007/3-540-45664-3_8 

2. Andrychowicz, M., Dziembowski, S., Malinowski, D., Mazurek, L.: Secure multi- 
party computations on bitcoin. In: 2014 IEEE Symposium on Security and Privacy, 
pp. 443-458. IEEE Computer Society Press, May 2014 

3. Bag, S., Hao, F., Shahandashti, S.F., Ray, I.G.: Seal: sealed-bid auction without 
auctioneers. IEEE Trans. Inf. Forensics Secur. 15, 2042-2052 (2019) 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


FAST: Fair Auctions via Secret Transactions 745 


Baudron, O., Stern, J.: Non-interactive private auctions. In: Syverson, P. (ed.) FC 
2001. LNCS, vol. 2339, pp. 364-377. Springer, Heidelberg (2002). https://doi.org/ 
10.1007 /3-540-46088-8_28 

Baum, C., David, B., Dowsley, R.: A framework for universally composable publicly 
verifiable cryptographic protocols. [ACR Cryptol. ePrint Arch. 2020, 207 (2020) 
Baum, C., David, B., Dowsley, R.: Insured MPC: efficient secure computation 
with financial penalties. In: Bonneau, J., Heninger, N. (eds.) FC 2020. LNCS, vol. 
12059, pp. 404-420. Springer, Cham (2020). https://doi.org/10.1007/978-3-030- 
51280-4_22 

Benhamouda, F., et al: Can a public blockchain keep a secret? In: Pass, R., 
Pietrzak, K. (eds.) TCC 2020. LNCS, vol. 12550, pp. 260-290. Springer, Cham 
(2020). https: //doi.org/10.1007/978-3-030-64375-1_10 

Bentov, I., Kumaresan, R.: How to use bitcoin to design fair protocols. In: Garay, 
J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8617, pp. 421-439. Springer, 
Heidelberg (2014). https: //doi.org/10.1007/978-3-662-44381-1_24 

Bentov, I., Kumaresan, R., Miller, A.: Instantaneous decentralized poker. In: Tak- 
agi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10625, pp. 410-440. 
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70697-9_15 

Bogetoft, P., Damgard, I., Jakobsen, T., Nielsen, K., Pagter, J., Toft, T.: A prac- 
tical implementation of secure auctions based on multiparty integer computation. 
In: Di Crescenzo, G., Rubin, A. (eds.) FC 2006. LNCS, vol. 4107, pp. 142-147. 
Springer, Heidelberg (2006). https: //doi-org/10.1007/11889663_10 

Brandt, F.: Fully private auctions in a constant number of rounds. In: Wright, 
R.N. (ed.) FC 2003. LNCS, vol. 2742, pp. 223-238. Springer, Heidelberg (2003). 
https: //doi.org/10.1007/978-3-540-45126-6_16 

Biinz, B., Bootle, J., Boneh, D., Poelstra, A., Wuille, P., Maxwell, G.: Bulletproofs: 
short proofs for confidential transactions and more. In: 2018 IEEE Symposium on 
Security and Privacy, pp. 315-334. IEEE Computer Society Press, May 2018 
Camenisch, J., Stadler, M.: Proof systems for general statements about discrete 
logarithms. Technical Report/ETH Zurich, Department of Computer Science, vol. 
260 (1997) 

Canetti, R.: Security and composition of multiparty cryptographic protocols. J. 
Cryptol. 13(1), 143-202 (2000) 

Cascudo, I., David, B.: ALBATROSS: publicly AttestabLe BATched randomness 
based on secret sharing. In: Moriai, S., Wang, H. (eds.) ASIACRYPT 2020. LNCS, 
vol. 12493, pp. 311-341. Springer, Cham (2020). https://doi.org/10.1007/978-3- 
030-64840-4_11 

Cleve, R.: Limits on the security of coin flips when half the processors are faulty 
(extended abstract). In: 18th ACM STOC, pp. 364-369. ACM Press, May 1986 
Cramton, P., et al.: Spectrum auctions. In: Handbook of Telecommunications Eco- 
nomics, vol. 1, pp. 605-639 (2002) 

David, B., Dowsley, R., Larangeira, M.: Kaleidoscope: an efficient poker proto- 
col with payment distribution and penalty enforcement. In: Meiklejohn, S., Sako, 
K. (eds.) FC 2018. LNCS, vol. 10957, pp. 500-519. Springer, Heidelberg (2018). 
https: //doi.org/10.1007/978-3-662-58387-6_27 

David, B., Gazi, P., Kiayias, A., Russell, A.: Ouroboros praos: an adaptively-secure, 
semi-synchronous proof-of-stake blockchain. In: Nielsen, J.B., Rijmen, V. (eds.) 
EUROCRYPT 2018. LNCS, vol. 10821, pp. 66-98. Springer, Cham (2018). https:// 
doi.org/10.1007/978-3-319-78375-8_3 

David, B., Gentile, L., Pourpouneh, M.: FAST: fair auctions via secret transactions. 
Cryptology ePrint Archive, Report 2021/264 (2021). https://ia.cr/2021/264 


746 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 
38. 


B. David et al. 


Deuber, D., Döttling, N., Magri, B., Malavolta, G., Thyagarajan, S.A.K.: Minting 
mechanism for proof of stake blockchains. In: Conti, M., Zhou, J., Casalicchio, E., 
Spognardi, A. (eds.) ACNS 2020. LNCS, vol. 12146, pp. 315-334. Springer, Cham 
(2020). https: //doi.org/10.1007/978-3-030-57808-4_16 

Dreier, J., Dumas, J.-G., Lafourcade, P.: Brandt’s fully private auction protocol 
revisited. J. Comput. Secur. 23(5), 587-610 (2015) 

Franklin, M.K., Reiter, M.K.: The design and implementation of a secure auction 
service. IEEE Trans. Softw. Eng. 22(5), 302-312 (1996) 

Galal, H.S., Youssef, A.M.: Trustee: full privacy preserving Vickrey auction on top 
of ethereum. In: Bracciali, A., Clark, J., Pintore, F., Rønne, P.B., Sala, M. (eds.) 
FC 2019. LNCS, vol. 11599, pp. 190-207. Springer, Cham (2020). https://doi.org/ 
10.1007/978-3-030-43725-1_14 

Hao, F., Zieliński, P.: A 2-round anonymous veto protocol. In: Christianson, B., 
Crispo, B., Malcolm, J.A., Roe, M. (eds.) Security Protocols 2006. LNCS, vol. 
5087, pp. 202-211. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3- 
642-04904-0_28 

Ishai, Y., Ostrovsky, R., Zikas, V.: Secure multi-party computation with identifiable 
abort. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8617, pp. 
369-386. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44381- 
1.21 

Juels, A., Szydlo, M.: A two-server, sealed-bid auction protocol. In: Blaze, M. (ed.) 
FC 2002. LNCS, vol. 2357, pp. 72-86. Springer, Heidelberg (2003). https: //doi.org/ 
10.1007 /3-540-36504-4_6 

Kiayias, A., Russell, A., David, B., Oliynykov, R.: Ouroboros: a provably secure 
proof-of-stake blockchain protocol. In: Katz, J., Shacham, H. (eds.) CRYPTO 2017. 
LNCS, vol. 10401, pp. 357-388. Springer, Cham (2017). https://doi.org/10.1007/ 
978-3-319-63688-7_12 

Kiayias, A., Zhou, H.-S., Zikas, V.: Fair and robust multi-party computation using 
a global transaction ledger. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 
2016. LNCS, vol. 9666, pp. 705-734. Springer, Heidelberg (2016). https://doi.org/ 
10.1007 /978-3-662-49896-5_25 

Klemperer, P.: Auctions: Theory and Practice. Princeton University Press, Prince- 
ton (2004) 

Krishna, V.: Auction Theory. Academic Press, Cambridge (2009) 

Kurosawa, K., Ogata, W.: Bit-slice auction circuit. In: Gollmann, D., Karjoth, 
G., Waidner, M. (eds.) ESORICS 2002. LNCS, vol. 2502, pp. 24-38. Springer, 
Heidelberg (2002). https://doi.org/10.1007/3-540-45853-0_2 

Lipmaa, H., Asokan, N., Niemi, V.: Secure Vickrey auctions without threshold 
trust. In: Blaze, M. (ed.) FC 2002. LNCS, vol. 2357, pp. 87-101. Springer, Heidel- 
berg (2003). https: //doi.org/10.1007/3-540-36504-4_7 

Mas-Colell, A., Whinston, M.D., Green, J.R., et al.: Microeconomic Theory, vol. 
1. Oxford University Press, New York (1995) 

Maxwell, G.: Confidential transactions (2016). https://people.xiph.org/~greg/ 
confidential_values.txt 

Miltersen, P.B., Nielsen, J.B., Triandopoulos, N.: Privacy-enhancing auctions using 
rational cryptography. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 
541-558. Springer, Heidelberg (2009). https: //doi.org/10.1007/978-3-642-03356- 
8.32 

Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008) 

Nurmi, H., Salomaa, A.: Cryptographic protocols for Vickrey auctions. Group 
Decis. Negot. 2(4), 363-373 (1993). https: //doi.org/10.1007/BF01384489 


39. 


40. 


41. 


FAST: Fair Auctions via Secret Transactions 747 


Pass, R., Shi, E.: Hybrid consensus: efficient consensus in the permissionless model. 
In: Richa, A.W. (ed.) 31st International Symposium on Distributed Computing, 
DISC 2017, volume 91 of LIPIcs, Vienna, Austria, 16-20 October 2017, pp. 39:1- 
39:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017) 

Pedersen, T.P.: Non-interactive and information-theoretic secure verifiable secret 
sharing. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 129-140. 
Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-46766-1_9 

Vickrey, W.: Counterspeculation, auctions, and competitive sealed tenders. J. 
Finance 16(1), 8-37 (1961) 


Check for 
updates 


Astrape: Anonymous Payment Channels 
with Boring Cryptography 


Yuhao Dong), Ian Goldberg, Sergey Gorbunov, and Raouf Boutaba 


University of Waterloo, Waterloo, ON N2L 3G1, Canada 
yd2dong@uwaterloo.ca 


Abstract. The increasing use of blockchain-based cryptocurrencies like 
Bitcoin has run into inherent scalability limitations of blockchains. Pay- 
ment channel networks, or PCNs, promise to greatly increase scalability 
by conducting the vast majority of transactions outside the blockchain 
while leveraging it as a final settlement protocol. Unfortunately, first- 
generation PCNs have significant privacy flaws. In particular, even 
though transactions are conducted off-chain, anonymity guarantees are 
very weak. In this work, we present Astrape, a novel PCN construc- 
tion that achieves strong security and anonymity guarantees with simple, 
black-box cryptography, given a blockchain with flexible scripting. Exist- 
ing anonymous PCN constructions often integrate with specific, often 
custom-designed, cryptographic constructions. But at a slight cost to 
asymptotic performance, Astrape can use any generic public-key signa- 
ture scheme and any secure hash function, modeled as a random oracle, 
to achieve strong anonymity, by using a unique construction reminiscent 
of onion routing. This allows Astrape to achieve provable security that 
is “generic” over the computational hardness assumptions of the under- 
lying primitives. Astrape’s simple cryptography also lends itself to more 
straightforward security proofs compared to existing systems. 

Furthermore, we evaluate Astrape’s performance, including that of a 
concrete implementation on the Bitcoin Cash blockchain. We show that 
despite worse theoretical time complexity compared to state-of-the-art 
systems that use custom cryptography, Astrape operations on average 
have a very competitive performance of less than 10 ms of computation 
and 1KB of communication on commodity hardware. Astrape explores 
a new avenue to secure and anonymous PCNs that achieves similar or 
better performance compared to existing solutions. 


1 Introduction 


1.1 Payment Channel Networks 


Blockchain cryptocurrencies are gaining in popularity and becoming a signif- 
icant alternative to traditional government-issued money. For instance, over 
300,000 Bitcoin transactions alone [2] are processed every day. Unfortunately, 


An extended version of this paper, and its accompanying source code, is available [12]. 
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such high demand inevitably leads to well-known scalability barriers [8]. Bit- 
coin, for instance, processes less than 10 transactions every second [20], far less 
than a reasonable global payment system. 

Payment channels [11] are a common technique to scale cryptocurrency trans- 
actions. In a nutshell, Alice and Bob open a payment channel by submitting a sin- 
gle transaction to the blockchain, locking up a sum of cryptocurrency from both of 
the parties. They can then pay each other by simply mutually signing a division of 
the locked money. Additional blockchain transactions are required only when the 
channel is closed by submitting an up-to-date signed division, unlocking the lat- 
est balances of Alice and Bob. This allows most activity to remain off-chain, while 
retaining the blockchain for final settlement: as long as the blockchain is secure, 
nobody can steal funds. More importantly, payment channels can be organized 
into payment-channel networks (PCNs) [20], where users without any open chan- 
nels between them can pay each other through intermediaries. 


1.2 Anonymity in PCNs 


Unfortunately, “first-generation” PCNs based on the HTLC (hash time-locked 
contract), such as Lightning Network [11], have a significant problem—poor 
anonymity [18]. In the worst case, HTLC payments are as transparently link- 
able as blockchain payments [18], threatening the improved privacy that is often 
cited [23,24] as a benefit of PCNs. Furthermore, naive implementations fall vic- 
tim to subtle fee-stealing attacks, like the “wormhole attack” [19], that threaten 
economic viability. 

A sizable body of existing work on fixing PCN security and privacy exists. 
On one hand, specialized constructions achieve strong anonymity in specific set- 
tings, such as Bolt [14] for hub-based PCNs on the Zcash blockchain, providing 
for indistinguishability of two concurrent transactions even when all intermedi- 
aries are malicious. On the other hand, general solutions for all PCN topolo- 
gies, like Fulgor [18] and the AMHL (Anonymous Multi-hop Locks) family [19], 
achieve a somewhat weaker, topology-dependent notion of anonymity: relation- 
ship anonymity [6,17]. This property, common to onion-routing and other anony- 
mous communication protocols, means that two concurrent transactions cannot 
be distinguished as long as they share at least one honest intermediary. 


1.3. Why Boring Cryptography? 


Unfortunately, there remains a shortcoming common to all existing anonymous 
PCN constructions—custom, often number-theoretic and sometimes complex 
cryptographic primitives. No existing anonymous PCN construction limits itself 
to the bare-bones cryptographic primitives used in HTLC—black-box access to 
a generic signature scheme and hash function. For example, AMHL uses either 
homomorphic one-way functions or special constructions that exploit the mathe- 
matical structure of ECDSA or Schnorr signatures and Tumblebit uses a custom 
cryptosystem based on the RSA assumption. Blitz [5], though relying on an 
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ostensibly black-box signature scheme, requires it to have a property! that rules 
out many post-quantum signature schemes. 

However, it is unclear that relationship anonymity requires sophisticated 
techniques. Relationship anonymity appears to be relatively “easy” elsewhere. 
Well-understood anonymous constructs like onion routing and mix networks 
exist for communication with no more than standard primitives used in secure 
communication (symmetric and asymmetric encryption). Of course, communi- 
cation is probably easier—indeed some go beyond relationship anonymity with 
only simple cryptography—but it seems plausible that PCNs can use similarly 
elementary primitives to achieve anonymity. 

Furthermore, “boring” cryptography has practical advantages. For one, non- 
standard cryptography poses significant barriers to adoption. Reliable and per- 
formant implementations of novel cryptographic functions are difficult to obtain, 
and tight coupling between a PCN protocol and a particular cryptographic con- 
struction makes swapping out primitives impossible. With use of black-box cryp- 
tography, a system is generic over cryptographic hardness assumptions—instead 
of assuming that, say, the RSA or discrete-log problems are hard, we only need 
to assume that there exists, for example, some secure signature scheme and some 
secure hash function. 

Thus, we believe that efficient yet privacy-preserving PCNs that only use well- 
understood and easily replaced black-box cryptographic primitives are crucial 
to usable PCNs. In fact, AMHL’s authors already proposed that “an interesting 
question related to [anonymous PCN constructions] is under which class of hard 
problems such a primitive exists” [19] that they conjecturally answered with 
linear homomorphic one-way functions. 


1.4 Our Contributions 


In this paper, we present Astrape,? a PCN protocol that limits itself to “boring”, 
generic cryptography already used in HTLC, yet achieves strong relationship 
anonymity. Despite achieving comparable security, privacy, and performance to 
other anonymous PCN constructions, Astrape does not introduce any crypto- 
graphic constructs other than those used in HTLC. This is accomplished using 
a novel construct reminiscent of onion routing that avoids the use of any form 
of zero-knowledge verification. 


2 Background and Related Work 


2.1 First-Generation PCNs with HTLC 


An extremely useful property of payment channels is that they can be used to 
construct payment channel networks (PCNs) [8,10,20], allowing users without 


1 In particular, the ability for any party, given any public key, to generate new public 
keys that correspond to the same private key yet are unlinkable to the previous 
public key. This is crucial to the “stealth addresses” that Blitz’s pseudonymous 
privacy rests upon. 

2 Greek for “lightning”, pronounced “As-trah-pee”. 
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channels directly between each other to pay each other via intermediaries. At the 
heart of any PCN is a secure multi-hop transaction mechanism—some way of 
Alice paying Bob to pay Carol without any trust in Bob. Most PCNs implement 
this using a smart contract known as the Hash Time-Lock Contract (HTLC). 
An HTLC is parameterized over a sender Alice, the recipient Bob, a deadline t, 
and a puzzle s. It locks up a certain amount of money, unlocking it according to 
the following rules: 


— The money goes to Bob if he produces 7 where H(m) = s before time t, where 
H is a secure hash function. 
— Otherwise, the money goes to Alice. 


We can use HTLC to construct secure multi-hop transactions. Consider a 
sender Up wishing to send money to a recipient Un through untrusted interme- 
diaries U;,...,Un—1. At first, Up will generate a random 7 and s = H(z), while 
sending the pair (r,s) to U, over a secure channel. Up can then lock money in 
a HTLC parameterized over Up, Ui, s,t 1, notifying U1. U; would send an HTLC 
over U1, U2, s,t2, notifying U2, and so on. The deadline must become earlier at 
each step—t, > to >--- > t,—this ensures that in case of an uncooperative or 
malicious intermediary, funds always revert to the sender. 

The payment eventually will be routed to Up, who will receive an HTLC over 
Un-1, Un, 8, tn. The recipient will claim the money by providing 7; this allows Un—1 
to claim money from U;,_2 using the same 7, and so on, until all outstanding HTLC 
contracts are fulfilled. Up has successfully sent money to Un, while the preimage 
resistance of H prevents any intermediary from stealing the funds. 


2.2 Hub-Based Anonymous Payment Channels 


Unfortunately, HTLC has an inherent privacy problem—a common identifier 
s = H(r) visible to all nodes in the payment path [14,18,19]. This moti- 
vates anonymous PCN design. Hub-based approaches form the earliest kind 
of anonymous PCN design. Here, the shape of the network is limited to a 
star topology with users communicating with a centralized hub. Some solutions 
are highly specialized, such as Green and Miers’ Bolt [14], which relies on the 
Zcash blockchain’s zero-knowledge cryptography. Other solutions, such as Tum- 
blebit [15] and the more recent A2L [22], provide more general solutions that 
work on a wide variety of blockchains. 

Hub-based PCN constructions tackle the difficult problem of providing unlink- 
ability between transactions despite the existence of only a single untrusted inter- 
mediary. It is therefore unsurprising that specialized cryptography is needed to 
protect anonymity. On the other hand, observations of real-world PCNs like the 
Lightning Network, as well as economic analysis [13], show that actual PCNs often 
have intricate topologies without dominating hubs. General, topology-agnostic 
solutions are thus more important to deploying private PCNs in practice. 


2.3 Relationship-Anonymous Payment Channels 


Unlike hub-based approaches, where no intermediaries are trusted, general pri- 
vate PCN constructions target relationship anonymity. This concept, shared 
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with onion routing and other anonymous communication protocols, assumes 
at least one honest intermediary. Thus intermediaries are in fact crucial 
to relationship-anonymous PCNs’ privacy properties. Like most hub-based 
approaches, relationship-anonymous payment channels do not by themselves deal 
with information leaked by side channels such as timing and value. 

The earliest solution to PCN privacy in this family was probably Fulgor and 
Rayo [18], a closely related pair of constructions that can be ported to almost all 
HTLC-based PCNs. Fulgor/Rayo combines a “multi-hop HTLC” contract with 
out-of-band ZKPs to remove the common identifier across payment hops. 

In a later work, Malavolta et al. [19] introduced anonymous multi-hop locks 
(AMHL), a rigorous theoretical framework for analyzing private PCN contracts. 
The AMHL paper provided a concrete instantiation using linear homomorphic 
one-way functions (hOWF's), as well as a conjecture that hOWFs are neces- 
sary for implementing anonymous PCNs. They also presented a variant that 
uses a clever encoding of homomorphic encryption in ECDSA to be used in 
ECDSA-based cryptocurrencies like Bitcoin. The latter “scriptless” variant was 
generalized in later work to a notion of adaptor signatures [4], where a signa- 
ture scheme like ECDSA is “mangled” in such a way that a correct signature 
reveals a secret based on a cryptographic condition. The authors of AMHL also 
discovered “wormhole attacks” on HTLC-based PCNs. These attacks exploit a 
fundamental flaw in the HTLC construction to allow malicious intermediaries 
to steal transaction fees from honest ones, a problem that AMHL’s anonymity 
techniques also solve. 

More recently, Blitz [5] introduced one-phase payment channels that sup- 
port multi-hop payments without a two-phase separation of coin creation and 
spending, improving performance and reliability. Blitz also achieves stronger 
anonymity than HTLC, but its notion of anonymity is strictly weaker than 
the relationship anonymity of AMHL and Fulgor/Rayo. Other relationship- 
anonymous systems consider powerful adversaries that control most nodes and 
achieve indistinguishability of concurrent transactions, but Blitz considers local 
adversaries controlling a single intermediary and limits itself to hiding the rest 
of the path from this intermediary. 


3 Our Approach 


As we argued in Sect. 1.3, all of these existing solutions share an undesirable 
reliance on either custom cryptographic constructions or primitives with special 
properties, like Blitz’s stealth-address signature schemes. This causes inflexibil- 
ity, difficult implementation, and an inability to respond to cryptanalytic break- 
throughs like practical quantum computing. 

Astrape is our solution to this problem. We show with a novel design that 
avoids the zero-knowledge verification paradigm, anonymous and atomic multi- 
hop transactions can be constructed with nothing but the two building blocks of 
HTLCs—hashing and signatures. Unlike existing work, no specific assumptions 
about the structure of the hash function or signature scheme are made, allowing 
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Astrape to be easily ported to different concrete cryptographic primitives and 
its security properties to “fall out” from those of the primitives. This also allows 
Astrape to achieve high performance on commodity hardware using standard 
cryptographic libraries. 


3.1 Generalized Multi-hop Locks 


In our discussion of Astrape, we avoid describing the concrete details of a specific 
payment channel network and cryptocurrency. Instead, we introduce an abstract 
model—generalized multi-hop locks. This model readily generalizes to different 
families of payment channel networks. 

We model a sender, Up, sending money to a receiver, Un, through intermedi- 
aries U;,...,Un— 1. We assume a “source routing” model, where the graph of all 
valid payment paths in the network is publicly known and the sender can choose 
any valid path to the recipient. After an initialization phase where the sender 
may securely communicate parameters to each hop, each user U; where i < n 
creates a coin and notifies U;,,. This coin is simply a contract €;4, known as a 
lock script, that essentially releases money to Uj+1 given a certain key kj41. We 
call this lock the right lock of U; and the left lock of Ui+1. 

Finally, the payment completes once all coins created in the protocol have 
been unlocked and spent by fulfilling their lock scripts. Typically, this happens 
through a chain reaction where the recipient’s left lock n is unlocked, allowing 
Un—1 to unlock its left lock, etc. 

Formally, we model a GMHL over a set of participants U; as a tuple of four 
PPT algorithms L = (Init, Create, Unlock, Vf), defined as follows: 


Definition 1. A GMHL L = (Init, Create, Unlock, Vf) consists of the following 

polynomial-time protocols: 

1. (s, ..., (84, kn))) = (nity, (1, U1, ..- , Un), Inity,,..., Inity,): the initializa- 
tion protocol, started by the sender Uo, that takes in a security parameter 1^ 
and the identities of all hops U; and returns an initial state s} to all users 
U;. Additionally, the recipient receives a key kn. 

2. (li, sE), (Gi, s’)) = (Createy,_,(s/_,), Createy,(s1)): the coin-creating pro- 
tocol run between two adjacent hops U;_, and U;, creating the “coin sent from 
U;_1 to U;”. This includes a lock representation l; as well as additional state 
on both ends — unlocking the lock represented by €; releases the money. 

3. ki = Unlocky, (li+1, (s2, s}, sÈ), ki+1): the coin-spending protocol, run by each 
intermediary U; where i < n, obtains a valid unlocking key ki for the “left 
lock” li given its “right lock” €;41, tts unlocking key ki41 (already verified by 
Vf below), and U;’s internal state. 

4. {0,1} = Vf (£, k): given a lock representation £ and an unlocking key k, return 
1 iff the k is a valid solution to the lock £ 


As an example, a formalization of HTLC in the GMHL model can be found in 
the extended version of this paper [12, App. Al. 
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Generalizability to Non-PCN Systems. We note here that GMHL makes no 
mention of typical PCN components such as channels, the blockchain, etc. This 
is because GMHL is actually agnostic of how exactly the locks are evaluated 
and enforced. In a typical PCN, these locks will be executed within bilateral 
payment channels, falling back to a public blockchain for final settlement. 

However, other enforcement mechanisms can be used. Notably, all the locks 
could simply be contracts directly executing on a blockchain. In this way, any 
anonymous PCN formulated in the GMHL model is equivalent to a specifica- 
tion for a provably anonymous on-chain, multi-hop coin tumbling service that 
can anonymize entirely on-chain payments by routing them through multiple 
intermediaries. 


Comparison to existing work. GMHL is an extension of anonymous multi-hop 
locks, the model used in the eponymous paper by Malavolta et al. [19]. In partic- 
ular, AMHL defines an anonymous PCN construction in terms of the operations 
KGen, Setup, Lock, Rel, Vf, four of which correspond to GMHL functions. 

Although AMHL’s model is useful, we could not use it verbatim. This is 
largely because AMHL’s original definition [19] also included its security and 
privacy properties, while we wish to be able to use the same framework in a 
purely syntactic fashion to discuss PCNs with other security and anonymity 
goals. 

Nevertheless, GMHL can be considered as AMHL, reworded and used in a 
more general context. As we will soon see, Astrape’s desired security and privacy 
properties are actually very similar to those of AMHL, though we will consider 
other systems formulated in the GMHL framework along the way. Astrape can 
be considered an alternative implementation of the same “anonymous multi-hop 
locks” [19] construct. 


3.2 Security and Execution Model 


Now that we have a model to discuss PCN constructions, we can discuss our 
security model, as well as a model of the GMHL execution environment in which 
Astrape will execute. 


Active Adversary. We use a similar adversary model to that of AMHL [19]. 
That is, we model an adversary A with access to a functionality corrupt(U;) that 
takes in the identifier of any user U; and provides the attacker with the com- 
plete internal state of U;. The adversary will also see all incoming and outgoing 
communication of U;. corrupt(U;) will also give the adversary active control of 
U;, allowing it to impersonate U; when communicating with other participants. 


Anonymous Communication. We assume there is a secure and anonymous mes- 
sage transmission functionality Fanon that allows any participant to send mes- 
sages to any other participant. Messages sent by an honest (non-corrupted) user 
with Fanon hide the identity of the sender and cannot be read by the adversary, 
although the adversary may arbitrarily delay messages. 
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There are many ways of implementing Fanon, the exact choice of which is out- 
side the scope of this paper. One solution recommended by existing work [18, 19] 
is an onion-routing circuit constructed over the same set of users U;, constructed 
with a provably private protocol like Sphinx [9]. Public networks such as Tor may 
also be used to implement Fanon- 


Exposed Lock Activity. In contrast to communication, lock activity—the content 
of all locks being created, as well as the unlocking keys during unlocking—is not 
secure. This is because in practice, lock activity often happens on public media 
like blockchains. We pessimistically assume that the adversary can see all lock 
activity, while a non-adversary only sees lock activity concerning locks that it 
sends and receives. 


Liveness and Timeouts. We assume that every coin lock 4; comes with an appro- 
priate timeout that will return money to U;—ı (i.e., able to be unlocked by a 
signature from U;_, after the timeout) if U; does not take action. We also assume 
that each left lock @;’s deadline is at least 6 later than that of the right lock @;41, 
where 6 is an upper bound on network latency between honest parties, even 
under disruption by the adversary. In the most common setting of a PCN con- 
sisting of bilateral payment channels backed by a blockchain, this is essentially 
a blockchain censorship-resistance assumption. With a liveness assumption, we 
can then omit timeout handling from the description of the protocol, in line with 
related work (such as AMHL [19]). 


Infallible Lock Execution. We formulate Astrape in the GMHL model, and 
assume the existence of a mechanism that will guarantee that cryptocurrency 
locks are always correctly executed in the face of arbitrary adversarial activity. 
In practice, both bilateral payment channels falling back to a general-purpose 
public blockchain (like Ethereum) and direct use of this blockchain are good 
approximations of this mechanism. 


Lock Functionality. We assume that inside our on-chain contracts we are able 
to use at least the following operations: 


— Concatenation, producing a bitstring z||y of length |n+-m| from two bitstrings 
x,y, where x has length n and y has length m. 
— Bitwise XOR, producing a bitstring x ® y from two bitstrings x, y 


as well as the cryptographic hash function H defined below. An implication of 
this assumption is that PCNs on blockchains with highly restricted scripting 
languages, like Bitcoin, cannot use Astrape. 


Cryptographic Assumptions. One of Astrape’s main goals is to make minimal 
cryptographic assumptions. We assume only: 


— Generic cryptographic hash function. We assume a hash function H, modeled 
as a random oracle for the purpose of security proofs, producing À bits of 
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output, where 1 is the security parameter. We use the random oracle both 
as a pseudorandom function and as a commitment scheme, which is well 
known [7] to be secure. 

— Generic signature scheme. We assume a secure signature scheme that allows 
for authenticated communication between any two users U; and U}. 


3.3 Security and Privacy Goals 


Against the adversary we described above, we want to achieve the following 
security and privacy objectives: 


Relationship Anonymity. Given two simultaneous payments between different 
senders S491; and receivers Ryo,1} with payment paths of the same length inter- 
secting at the same position at at least one honest intermediate user, an adver- 
sary corrupting all of the other intermediate users cannot determine, with prob- 
ability non-negligibly better than 1/2 (guessing), whether Sp paid Ro and Sı 
paid R,, or So paid Rı and Sı paid Ro. This is an established standard for 
anonymity in payment channels [18,19] and is analogous to similar definitions 
for anonymous communication [6,21]. It is important to note that the adversary 
is not allowed to corrupt the sender—senders always know who they are sending 
money to. 


Balance Security. For an honest user Uj, if its right lock 4; is unlocked, U; must 
always be able to unlock its left lock 4;—1 even if all other users are corrupt. Com- 
bined with the timeouts mentioned in our security model, this guarantees that 
no intermediary node can lose money even if everybody else conspires against it. 


Wormhole Resistance. We need to be immune to the wormhole attack on PCNs, 
where malicious intermediaries steal fees from other intermediaries. The reason 
why is rather subtle [19], but for our purposes this means that given an honest 
sender and an honest intermediary Uj;,1, l; cannot be spent by U; until Uj41 
spends ¢;11. Intuitively, this prevents honest intermediaries from being “left 
out”. 


4 Construction 


4.1 Core Idea: Balance Security + Honest-Sender Anonymity 


Unlike existing systems that utilize the mathematical properties of some cryp- 
tographic construction to build a secure and anonymous primitive, Astrape is 
constructed out of two separate broken constructions, both of which use boring 
cryptography and are straightforward to describe: 


— XorCake, which has relationship anonymity but lacks balance security if the 
sender Uo is malicious 


Astrape: Anonymous Payment Channels with Boring Cryptography 757 


— HashOnion which has balance security, but loses relationship anonymity 
in the Unlock phase. That is, an adversary limited only to observing Init 
and Create cannot break relationship anonymity, but an adversary observing 
Unlock can. 


The key insight here is that if we can combine XorCake and HashOnion in 
such a way to ensure that HashOnion’s Unlock phase can only reveal informa- 
tion when the sender is malicious, we obtain a system, Astrape, that has both 
relationship anonymity and balance security. This is because the definition of 
relationship anonymity assumes an honest sender: if the sender is compromised, 
it can always simply tell the adversary the identity of its counterparty, breaking 
anonymity trivially. It is important to note that such a composition does not in 
any way weaken anonymity compared to existing “up-front anonymity” systems 
like AMHL, even in the most pessimistic case.’ 

We now describe XorCake and HashOnion, and their composition into 
Astrape. 


4.2 XorCake: Anonymous but Insecure Against Malicious Senders 


Let us first describe XorCake’s construction. XorCake is an extremely simple 
construction borrowed from “multi-hop HTLC”, a building block of Fulgor [18]. 
It has relationship anonymity, but not balance security against malicious senders. 

Recall that in GMHL, the sender (Uo) wishes to send a sum of money to the 
recipient (Un) through U;,...,Un—1. At the beginning of the transaction, the 
sender samples n independent A-bit random strings (r1,...,7n). Then, for all 
i€1,...,n, she sets n values s; = H (r; @ris41 ®-+: Pn), where H is a secure 
hash function. That is, s; is simply the hash of the XOR of all the values rj 
for 7 > i. Up then uses the anonymous channel Fanon to provide Un the values 
(Tn, Sn) and all the other U; with (ri, Si, $i41). 

Then, for each pair of neighboring nodes (U;,Ui+1), U; sends U;+ı a coin 
encumbered by a regular HTLC ¢;,; asking for the preimage of s;,,. Un knows 
how to unlock @,,, and the solution would let U,_, unlock £,_1, and so on. That 
is, each lock @; is simply an HTLC contract asking for the preimage of s;. 

In the extended version [12, App. A], we give the formal definition of XorCake 
in the GMHL framework. 

XorCake by itself satisfies relationship anonymity. A full proof is available in 
the Fulgor paper from which XorCake was borrowed [19], but intuitively this is 
because r; will be randomly distributed over the space of possible strings because 
H behaves like a random oracle. This means that unlike in HTLC, no two nodes 
U; and U; can deduce that they are part of the same payment path unless they 
are adjacent. 


3 In a sense then, Astrape has “pseudo-optimistic” anonymity. Its design superficially 
suggests an optimistic construction with an anonymous “happy path” and a non- 
anonymous “unhappy path”, but the latter non-anonymity is illusory—the sender 
can always prevent the “unhappy” path from deanonymizing the transaction even if 
all other parties are malicious. 
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State-Mismatch Attack. Unfortunately, XorCake does not have balance security. 
Consider a malicious sender who follows the protocol correctly, except for sending 
an incorrect r; to U;. (Note that U; cannot detect that r; is incorrect given a 
secure hash function.) Then, when ¢;41 is unlocked, 4; cannot be spent! In an 
actual PCN such as the Lightning Network, all coins “left” of U; will time out, 
letting the money go back to Up. Uo paid Un with U;’s money instead of her 
own. We call this the “state-mismatch attack”, and because of it, XorCake is 
not a viable PCN construction on its own. In Fulgor, XorCake was combined 
with out-of-band zero-knowledge proofs of the correctness of r;, but as we will 
see shortly, Astrape can dispense with them. 


4.3 HashOnion: Secure but Eventually Non-anonymous 


We now present HashOnion, a PCN construction that has balance security 
but not relationship anonymity. Note that unlike HTLC, HashOnion’s non- 
anonymity stems entirely from information leaked in the Unlock phase, a prop- 
erty we will leverage to build a fully anonymous construction combining HashOn- 
ion and XorCake. 

At the beginning of the transaction, Up generates random values s; for i € 
{1,...,n}, then “onion-like” values x;, recursively defined as x; = H(s,||xi+1), 
£n = H(sn||0>). 

Essentially, x; is a value that commits to all s; where j > i. An onion-like 
commitment is used rather than a “flat” commitment (say, a hash of all s; where 
j > i) as it is crucial for balance security, as we will soon see. 

For all intermediate nodes 0 < i < n, the sender sends (241, si) to U;, while 
for the destination, the sender sends sp. Then, each intermediary U;_; sends to 
its successor U; a lock €;, which can be only be unlocked by some ki = (8;,..., Sn) 
where H(s;||H(si41||H(...H(sn||0*)))) = xj. Ui_1 constructs this lock from the 
x; it received from the sender. Finally, during the unlock phase, the recipient 
Un solves n with kn = (sn). This allows each U; to spend ¢;, completing the 
transaction. 

For balance security, we need to show that with a solution kj, = 
(Si41,---,8n) to €j41, and si, we can always construct a solution to 4; . This 
is obvious: we just add s; to the solution: ki = (Si, $:41,---,8n)- 

One subtle problem is that U; needs to make sure that its left lock is actually 
the correct 4; and not some bad ¢/ parameterized over some x; 4 H(s;||vi41). 
Otherwise, its right lock might get unlocked with a solution that does not let 
it unlock its left lock. Fortunately, this is easy: given s;,2;41 from the sender, 
U; can just check that its left lock, parameterized over some x;, matches x; = 
H (si||£i+1) before sending out ¢;41 (parameterized with x;,1) to the next hop. 
Thus, every user can make sure that if its right lock is unlocked, so can its left 
lock, so balance security holds. 

We also see that although the unlocking procedure breaks relationship 
anonymity by revealing all the s;, before the unlock happens, HashOnion does 
have relationship anonymity. This is because the adversary cannot connect the 
different x; as long as one s; remains secret—that of the one honest intermediary. 
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4.4 Securing XorCake+HashOnion 


We now move on to composing XorCake and HashOnion. We do so by creating a 
variant of HashOnion that embeds XorCake and recognizes an inconsistency wit- 
ness. That is, this variant of HashOnion will unlock only when given a combina- 
tion of values that proves an attempt by the sender to execute a state-mismatch 
attack for XorCake. 

To construct such a lock, after generating the XorCake parameters, Up cre- 
ates n A-bit values x; recursively: 


XorCake parameters 
—_ 
Tn =On, — & = A(r||s4|] 8:41 |]0%||2:41) 


where o; is a random nonce sampled uniformly from all possible A-bit values.* 
The intuition here is that x; commits to all the information Up would give to all 
hops U; where j > i. 

Afterwards, the sender then uses Fanon to send (oi, £i, £i+1) , in addition to 
the XorCake parameters (ri, Si, 5:41), to every hop i. Every hop U; checks that 
all the parameters are consistent with each other. 

We next consider what will happen if the sender attempts to fool an interme- 
diate hop U; with a state-mismatch attack. U;+ı would unlock its left lock €;+4 
by giving ki+ı where H(kj41) = s;41 but H(r; © ki+1) Æ si. This then causes 
U; to fail to unlock its left lock. 

But this attempt allows U; to generate a cryptographic witness verifiable to 
anybody knowing x;: A-bit values kj41, fi, Si, Si+1, Oi, Ti41 where: 


A (ki41) = sign, H(ri © kisi) Æ si, A(ril|si||si41|loil|ei41) = vi 


This inconsistency witness proves that the preimage of s;,; XOR-ed with r; 
does not equal the preimage of s;, demonstrating that the values given to U; are 
inconsistent and that Up is corrupt. Since U;_1 knows 2;, U; can therefore prove 
that it was a victim of a state-mismatch attack to U;_1. 

Since x; commits to all XorCake initialization states “rightwards” of U;, Uj, 
in cooperation with U;_,, can also produce a witness that U;—2 can verify using 
xz;-1. This is simply a set of A-bit values ki+1, ri-1, Si—1, Ti, $i,$i41, Vi, Oi, Oi—1, 
X41 where: 

A(ki+1) = siti, H (r: © kisi) F Si, 


H(riļjsillsi+illoi||£i+1) = ri, H(ri-a||8i—1||$2|loi—1| ae) = via 


We can clearly extend this idea all the way back to U,—given a witness 
demonstrating a state-mismatch attack against U;, U;_, can verify the witness 
and generate a similar one verifiable by U;_2, and so on. This forms the core 
construction that Astrape uses to fix XorCake’s lack of balance security. 


4 || denotes concatenation. In our case, it is possible to unambiguously separate con- 
catenated values, since we only ever concatenate A-bit values. 
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4.5 Complete Construction 


We now present the complete construction of Astrape, as formalized in Fig. 1 
within the GMHL framework. Note that we use the notation Tag[#1,...,2n] to 
represent a n-tuple of values with an arbitrary “tag” that identifies the type of 
value. 


function Unlock#5 (4; vst ky 
function Init (1>, Ujena) U; (4:41, Sis ki+1) 


Upon invocation by Uo: 
generate A-bit random numbers 


Upon invocation by Ui, where i < n: 
Ti + rillsills:+1llo: 
parse si = (fi, Si, Si+1, Vi, Li41, Oi) 


{risata} : i a 
En < random A-bit number if parse ki41 = HSoln[k;+1] then 
ri if H(ri Ọ ki+i) = si then 
FOR IR tg ee return k; = HSoln[r; ® «i+1] 
si A(ri @ri41 P Orn) else a N eed 
oi 4+ random A-bit number 
return ki = 


if i < n then 


ss = WSoln[Ki41, 2:41; {Li }] 


H(ri||si||si+1| loi ||ei+1) else r 
for i in 1,..., n do parse ipi = 
ifi=n then WSoln[«;, 25, {Li+ D5 }] 
send. s! = (kn = return ki = 
n y A 
HSoln[rn], sn) to Un WSoln[Kj, 25, {Di Tigi)... 053] 
else 
send st = function VfôS (£, k) 
(Ti, Sis Si+1; Ve, i41, 04) to U; parse £ = Astrape[z, s] 
if parse k = HSoln[«] then 
function Creates (s1 = oy return Liff H(k) =s C 
i normal” case 
(Ti, $i, Si41, Vi, Lit1, 01)) élse if parse k = 
Upon invocation by U;, where i < n: WSoln[x, x, {Li,..., T} }] then 
if x, A H(rillsillsi+illoillzi+1) if Ji s.t. Ty.length # 4A bits then 
then bs eed return 0 
i abort bad initial state parse T} = r5||5;||85 41/10; 
if i> 0 then dns if H(r; ®«) = sj then 
wait for €; = Astrape[#,, §;] to be return 0 > state good 
created 


if ĉ; A xi or §; Æ si then 
abort invalid left lock 
return @;41 = Astrape[xi41, si+1] 


ĉ 
H(Ti||H (Ti+ ll HU llx))) 

return 1 iff ĉ = x 
“inconsistency” case 


Fig. 1. Astrape as a GMHL protocol 


Initialization. In the first phase, represented as Init in GMHL, the sender Up 
first establishes communication to the n hops U;,...,Un, the last one of which 
is the receiver. When talking to intermediaries and the recipient, Up uses our 
abstract functionality Fanon. 

The sender then generates random A-bit strings (r1,..., rn) and (01,...,0n), 
deriving sı = A(r; ® riyı ®--: OTn) and z; = H(rillsillsi+illoil|£i+1); an = 
H(on). She then sends to each intermediate hop U; the tuple sł? = 
(Ti, Si, 8141, Ti, Zi+1, 0i) and gives the last hop Up the initial state s} = (rp, Sn) 
and the unlocking key kn = HSoln[rn]. 


Creating the Coins. We now move on to Create, where all the coins are ini- 
tially locked. Up then sends U a coin encumbered with a lock represented as 
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lı = Astrape[x1, si]. When each hop U; receives a correctly formatted coin 
from its previous hop U;_,, it sends the next hop U;i+ı a coin with a lock 
lj41 = Astrape[x7;41, 8:41]. Note that U; checks whether its left lock is consistent 
with the parameters it received from Up; this ensures that when U;’s right lock 
unlocks later, U; can always construct a solution for its left lock. If the checks 
fail, Create aborts, and all of the locks will eventually time out (see Sect.3.2), 
returning money to the sender. 

As specified in Vf, each of these locks 4; can be spent either through solving 
a XorCake-type puzzle to find the preimage of s; (the “normal” case) or by pre- 
senting an inconsistency witness with a HashOnion-type witness demonstrating 
zis commitment to inconsistent data (the “inconsistency” case). After all the 
transactions with Astrape-encumbered coins are sent, U, can finally claim its 
money, triggering the next phase of the protocol. 


Unlocking the Coins. The last step is Unlock. After receiving the final coin from 
Un—1, the recipient unlocks its lock „ by providing to Vf the preimage of the 
HTLC puzzle: kn = HSoln[r,,]—this is the only way an honest recipient can claim 
the money in a payment originating from an honest sender. Each intermediate 
node U; reacts when its right lock @;,; is unlocked with key kj4+1: 


— If Ujz41 solved the HTLC puzzle with kj, = HSoln[«K;41], construct ki = 
Ti Ọ Ki+1 

e If H(k;i) = s;, this means that there is no state-mismatch attack happen- 
ing. We unlock our left lock with k; = HSoln[«;]. 

e Otherwise, there must be an attack happening. We construct a witness 
and create a key that embeds the witness verifiable with x;. This gives us 
ki = WSoln[«i+1, Vi+1; {r}, where T; = rillsillsi+ılloi. 

— Otherwise, U;}ı demonstrated that the sender attempted to defraud some 
Uj, where j > i unlocked €;4; by presenting an inconsistency witness kj.1 = 
WSoln[k;, Tj, {Tiai, Tiya TE _T;}). 

e We can simply construct k; = WSoln[«,;,2;,{Ii, Di4i,...,[5}] where 
Ti = r4||si||$i41||0;. This transforms the witness verifiable with 7;41 toa 
witness verifiable with x;. 


Note that both cases are covered by Vf—it accepts and verifies both “nor- 
mal” unlocks with HSoln-tagged tuples, and “inconsistency” unlocks with WSoln. 
Thus, even though Astrape is a composition of XorCake and HashOnion, the 
final construction fully “inlines” the two into the same flow of initialization, coin 
creation, and unlocking, with no separate procedure to process inconsistency 
witnesses. Unlocking continues backwards towards the sender until all the locks 
created in the previous step are unlocked. We have balance security—node U; 
can unlock its left lock 4; if and only if node U;,, has unlocked ¢;,1, so no 
intermediaries can lose any money. 

Security proofs, as well as a discussion on side-channel and griefing attacks, 
can be found in the extended version [12, 85]. 
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5 Blockchain Implementation 


Astrape is easy to implement on blockchains with Turing-complete scripting 
languages, like Ethereum, as well as layer-2 PCNs such as Raiden built on these 
blockchains, but blockchains without Turing-complete scripts involve two main 
challenges. 

First, these blockchains typically do not allow recursion or loops in lock 
scripts. This means that we cannot directly implement the Vf function. Instead, 
we must “unroll” Vf to explicitly check for witnesses to inconsistencies in the 
parameters given to U;, Ui+1, etc. So for an n-hop payment the size of every lock 
script grows to O(n). In practice, the mean path length in the Lightning Network 
is currently around 5 (see our measurements in Sect. 6.4), and privacy-focused 
onion routing systems such as Tor or I2P typically use 3 to 5 hops. We believe 
linear-length script sizes are not a significant concern for Astrape deployment. 
An Astrape deployment can simply pick an arbitrary maximum for the number 
of hops supported and achieve reasonable worst-case performance. 

The second issue is more serious: some blockchains have so little scripting that 
Astrape cannot be implemented. Astrape requires an “append-like” operation 
|| that can take in two bytestrings and combine them in a collision-resistant 
manner. Unfortunately, the biggest blockchain Bitcoin has disabled all string- 
manipulation opcodes. Whether an implementation based solely on the 32-bit 
integer arithmetic that Bitcoin uses is possible is an interesting open question. 


6 Comparison with Existing Work 


In this section, we compare Table 1. Comparison of different PCNs 
Astrape with existing PCN con- : 
: : Topology Anon® Effici 

structions. First, we compare —7opolosy Anon” Efficient” Crypto 
A t N J desi m choices and. fea. HTLC Mesh No Yes Sig. + hash 

S rape 2 8 Tumblebit Hub Yes No Custom RSA 
tures with that of other systems, Bolt Hob Yès. Ys NIZKP 
showing that it explores a novel Teechain Hub Yes Yes Trusted comp 
design space. Then, we evaluate Fulgor/Rayo Mesh Yes No ZKP 
Astrape’s concrete performance.  “MHIvan" Mesh Yes Yes  Homom. OWF 

; AMHLeca Mesh Yes Yes ECDSA, 

We compare Astrape’s perfor- Siete eo 
mance with that of other PCN AMHLgch Mesh Yes Yes Schnorr sigs 
constructions, both anonymous Blitz Mesh Weak? Yes SA® sig. + hash 
and non-anonymous. Finally, we Astrape Mesh Yes Yes Sig. + hash 


a Relationship anonymity 


r r Ia r r 
explore Astr apes. per for MANCE b Roughly comparable performance to HTLC. For example, 


on a real-world network graph ZKPs requiring many orders of magnitude more compu- 
: > tation time than HTLC are not considered “efficient”. 

from the Lightning Network. c AMHL is a family of three closely related constructions. 
We denote by van, ecd, sch the “vanilla”, ECDSA, and 
Schnorr implementations respectively. 

6.1 Design Comparison d See discussion in Section 2.3. 

e A “stealth-address” signature scheme; i.e., a signature 
scheme where any party knowing a public key can gen- 


, 
In Table 1, we compare Astrape S erate unlinkably different public keys that correspond to 
properties with those of existing the same private key. 
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Table 2. Resource usage of different PCN systems (n hops, c-byte HTLC contract, 
d-byte Astrape contract); AHML variants van,ecd,sch as in Table 1 


Plain Fulgor/ AMHL Astrape Astrape 
HTLC Rayo (Bitcoin Cash) 
Comput. time (ms) < 0.001 =% 200n =n (van) = 0.7n = 0.25n 
= 3n (ecd) 
x 3n (sch) 
Comm. size (bytes) 32n 1650000n 32 + 96n (van) 1927n 192n 


416 + 128n (ecd) 
256 + 128n (sch) 


Lock (bytes) 32+¢ 32+¢ 32+c 108 +39. n 64+d 
Unlock, normal case (bytes) 32 32 32 (van) / 64 32 32 
Unlock, worst case (bytes) 32 32 32 (van) / 64 644128-n 64+4128-n 


payment channel networks. We see that except for HTLC, which does not achieve 
anonymity, all previous PCN networks use cryptographic constructions specialized 
for their use case. Furthermore, only more recent constructions achieve efficiency 
comparable to HTLC. It is thus clear that Astrape is the first and only PCN con- 
struction that works on all PCN topologies, achieves strong anonymity, and per- 
forms at high efficiency, while using the same simple cryptography as HTLC. 


6.2 Implementation and Benchmark Setup 


To demonstrate the feasibility and performance of our construction, we devel- 
oped a prototype implementation in the Go programming language. We imple- 
mented all the cryptographic constructions of Astrape inside a simulated GMHL 
model. We used the libsodium library’s implementation of the ed25519 [16] sig- 
nature scheme and blake2b [3] hash function. In addition, we generated script 
locks in Bitcoin Cash’s scripting language to illustrate script sizes for scripting 
languages with no loops. The Bitcoin Cash scripts, written in the higher-level 
CashScript language, can be found in the extended version [12, App. B]. 

All tests were done on a machine with a 3.2 GHz Intel Core i7 and 16 GB RAM. 
Network latency is not simulated, as this is highly application dependent. These 
conditions are designed to be maximally similar to those under which Fulgor [18] 
and AMHL [19] were evaluated, allowing us to compare the results directly. 


6.3 Resource Usage 


Our first set of tests compares Astrape’s resource usage to that of other PCN 
constructions. We compare both a simulation of Astrape and a concrete imple- 
mentation using Bitcoin Cash’s scripting language to traditional HTLC, Ful- 
gor/Rayo, and all three variants of AMHL. 

We summarize the results in Table2, where n refers to the number of hops, 
c to the size of an HTLC contract, and d to the size of an Astrape contract. We 
copy results for Fulgor/Rayo [18] and AMHL (ECDSA) [19] from their original 
sources, which use an essentially identical setup. 
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Fig. 2. Overhead distribution 


Computation Time. We measure computation time, with communication and 
other overhead ignored. The time measured is the sum of the CPU time taken 
by each hop, for all steps of the algorithm. We note that by eschewing non- 
standard cryptographic primitives, Astrape achieves lower computation times 
across the board compared to Fulgor/Rayo and AMHL. 


Communication Overhead. We also measure the communication overhead of 
each system. This is defined as all the data that needs to be communicated 
other than the locks and their opening solutions. For example, in Astrape, this 
includes all the setup information sent from Uo, while in AMHL this includes 
everything exchanged during the Setup, Lock, and Rel [19] phases. We see that 
Astrape has by far the least communication overhead of all the anonymous PCN 
constructions. Note especially the extreme overhead of the zero-knowledge proofs 
used in Fulgor/Rayo. 


Lock Overhead. The last measure is per-coin lock overhead—the size of each lock 
script (the “lock size”) and that of the information required to unlock it (the 
“unlock size”). This is a very important component of a system’s resource usage, 
since lock and unlock sizes directly translate into transaction fees in blockchain 
cryptocurrencies. Astrape’s performance differs in two important ways. 

First of all, Astrape’s Vf function is expressed in a recursive manner. In 
blockchains like Bitcoin Cash that support neither recursion nor loops in their 
scripting language, we must “unroll” the Vf implementation. This causes lock 
sizes to be linear in the number of hops. In blockchains with general-purpose 
scripting languages, though, lock size is generally constant. Second, the worst- 
case unlock size is larger for Astrape. When the sender is malicious and all coins 
have to be spent by invoking HashOnion, we need n parameters (I4,...,In) 
to unlock each coin for an n-hop payment. However, despite this asymptotic 
disadvantage, we believe that Astrape nevertheless offers competitive lock per- 
formance. This is because payment routes are quite short in practice, as we will 
shortly see. 
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6.4 Statistical Simulation 


Finally, we simulate the performance of Astrape on the network graph of the 
Lightning Network (LN). 


Setup. We set up a mainnet Lightning Network node using the Ind [1] reference 
implementation. We then used the Incli describegraph command to capture the 
network topology of Lightning Network in February 2021. This gives us a graph 
of 9566 nodes and 72164 edges. Finally, we randomly sample 5,000 pairs of nodes 
in the network and calculate the shortest paths between them. This gives us a 
randomly sampled set of real-life payment paths. 


Path Statistics. As we have previously shown, paths more than 10 hops long 
still have fairly small overhead even with non-recursive lock scripts, but much 
longer paths will cause rather large unlock sizes. We examine whether the graph 
topology will force payments to grow too long; Fig. 2a illustrates the distribution 
of lengths for our 5,000 randomly selected payment paths. On average, a payment 
path was 5.12 hops long, though the Lightning Network specification allows up 
to 20 hops. This indicates that shortest payment paths long enough to pose 
seriously ballooning worst-case overhead are practically nonexistent. 


Total Scalability. One important attribute we wish to explore with the LN topol- 
ogy is the total scalability of the network—how fast can a PCN process trans- 
actions as a whole. 

To do so, we keep track of how many times each node appears, or is “hit”, in 
our 5,000 randomly selected payment routes. On average, this is 2.99, but the vast 
majority of nodes are hit only once, while a few nodes are hit hundreds of times. 
The distribution of hits is plotted in Fig. 2b as a log-linear histogram. We then 
look at the distribution of overhead in the network for both computation and 
communication. This is by calculating the total computation and communication 
cost for each node “hit” by the 1,000 random payments, using values from the 
Bitcoin Cash implementation. 

Computation cost is plotted in Fig. 2c. We see that the most heavily loaded 
node in the entire network did around 2,000 ms of computation to process 1,500 
transactions. This indicates that the largest hubs in a PCN with the current 
Lightning Network topology will be able to process around 750 transactions a 
second per CPU core. Such a throughput is orders of magnitude higher than that 
of typical blockchains and is within reach of many traditional payment systems. 
We note that this is only the maximum throughput of a single CPU core—in 
practice hubs likely have multicore machines, and with many hubs the total LN 
throughput will be many times this number. 

Communication cost is plotted in Fig.2d. We pessimistically assume that 
all payments are settled through HashOnion. Even so, the total network load 
averages to only about 3.58 KB per node. The largest hubs’ total load still do 
not exceed 1 MB. This illustrates that the bottleneck is actually computation, 
not communication. 


766 Y. Dong et al. 


In summary, we see that Astrape’s worst-case asymptotic performance poses 
no barriers to the total throughput of an Astrape-powered payment channel 
network. PCNs can enjoy the superb scalability associated with them just as 
easily with Astrape-powered privacy and security. 


7 Conclusion 


First-generation payment channel networks and other trust-minimizing interme- 
diarized cryptocurrency payment systems lack strong privacy and security guar- 
antees. Existing research, although solving the privacy and security problems, 
tend to rely on custom cryptographic primitives that cannot be easily swapped 
with alternatives based on different computational hardness assumptions. 

We presented Astrape, a novel PCN construction that breaks this conun- 
drum. Astrape is the first PCN that achieves relationship anonymity and bal- 
ance security with only black-box access to generic conventional cryptography. It 
relies on a general idea of using non-anonymous post-hoc inconsistency witnesses 
to achieve balance security, while avoiding any information leaks when senders 
are not corrupt. This allows Astrape to avoid dealing with the zero-knowledge 
verification used to achieve balance security in existing relationship-anonymous 
PCNs without sacrificing any anonymity or security properties. 

Furthermore, we demonstrate that Astrape is practical to deploy in the real 
world. Performance is superior on average to existing private PCNs, even on 
blockchains that are unsuitable for free-form smart contracts. We also showed 
that Astrape achieves high scalability on a real-world payment channel network 
graph. 
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Abstract. In 2002, Chow et al. initiated the formal study of white-box 
cryptography and introduced the CEJO framework. Since then, various 
white-box designs based on their framework have been proposed, all of 
them broken. Ranea and Preneel proposed a different method in 2020, 
called self-equivalence encodings and analyzed its security for AES. In 
this paper, we apply this method to generate the first academic white-box 
SPECK implementations using self-equivalence encodings. Although we 
focus on SPECK in this work, our design could easily be adapted to protect 
other add-rotate-xor (ARX) ciphers. Then, we analyze the security of our 
implementation against key-recovery attacks. We propose an algebraic 
attack to fully recover the master key and external encodings from a 
white-box SPECK implementation, with limited effort required. While 
this result shows that the linear and affine self-equivalences of SPECK are 
insecure, we hope that this negative result will spur additional research 
in higher-degree self-equivalence encodings for white-box cryptography. 
Finally, we created an open-source Python project implementing our 
design, publicly available at https://github.com/jvdsn/white-box-speck. 
We give an overview of five strategies to generate output code, which can 
be used to improve the performance of the white-box implementation. 
We compare these strategies and determine how to generate the most 
performant white-box SPECK code. Furthermore, this project could be 
employed to test and compare the efficiency of attacks on white-box 
implementations using self-equivalence encodings. 


Keywords: White-box cryptography - Self-equivalence - SPECK 


1 Introduction 


Traditionally, honest parties use cryptographic algorithms in combination with 
cryptographic keys to encrypt or decrypt messages. However, there are situations 
in which these keys must remain hidden in software, even from the party per- 
forming the encryption or decryption. In this case, the adversary has full control 
over the execution environment. As such, the implementation is a “white box” to 
the adversary. White-box cryptography is used to protect these implementations 
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against key-recovery attacks. From an attacker’s perspective, reverse engineer- 
ing and extracting a protected implementation of a cipher is less convenient 
compared to simply redistributing the keys. This implementation might also be 
restricted to a specific computing platform. As a result, white-box cryptogra- 
phy is a widely deployed method to protect private keys in the mobile banking 
industry and for digital rights management (DRM). 

In 2002, Chow et al. initiated the formal study of white-box cryptography 
in their seminal work [19]. They introduced the White-Box Attack Context, also 
called the white-box model. The white-box model has three main properties: 


— The attacker is a privileged user on the same host as the cryptographic algo- 
rithm, with complete access to the implementation. 

— The attacker can dynamically execute the cryptographic algorithm. 

— At any point before, during, or after the execution, the attacker is able to 
view and modify the internal details of the implementation. 


On top of this, they introduced the first academic framework (commonly 
called the CEJO framework) to generate protected implementations in the white- 
box model, based on the AES block cipher [21]. Shortly after publishing their 
work on AES, Chow et al. also applied their method to the protection of the DES 
block cipher [20]. Concurrently, a practical side-channel attack on the white-box 
DES implementation was published by Jacob et al., using Differential Fault 
Analysis [27]. However, this attack was not applicable to the AES implementa- 
tion protected using the CEJO framework. Still, it would take only two years for 
the initial AES implementation to be broken; in 2004, Billet et al. designed a 
practical key-recovery attack by analyzing the composition of the AES lookup 
tables [8]. 

The publication of these papers sparked more interest in the topic of white- 
box cryptography, with many new constructions based on DES [31] and AES 
[2,28,29,41,42] appearing over the years. Unfortunately, all of these implemen- 
tations have been broken, using both algebraic attacks [22, 24, 26,30,34,40] and 
attacks based on side-channel analysis [12,13,16]. All of these designs improved 
upon or were inspired by the CEJO framework. Consequently, this framework 
has been analyzed extensively. 

On the other hand, the work of Chow et al. also spurred research into entirely 
different types of constructions, using modified cipher designs [18] or completely 
new white-box ciphers [9, 14,15]. Often this includes different security goals, such 
as incompressibility or one-wayness [11]. Some of these new designs have enjoyed 
limited success, while others were quickly broken [23,35]. 

In 2016, McMillion et al. used a type of permutations called self-equivalences 
to construct a toy white-box implementation of AES [33]. A self-equivalence of 
a function is a pair of permutations which can be applied to the start and end of 
that function without changing the original behavior. McMillion et al. divided 
AES into substitution and permutation (affine) layers. Then, they computed 
the self-equivalences of the substitution layers and applied these self-equivalence 
encodings to the affine layers directly preceding and succeeding the substitu- 
tion layer. The resulting white-box implementation (also called a self-equivalence 
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implementation) is a composition of substitution layers and encoded affine lay- 
ers containing the round keys. In the same work, they also presented a practical 
attack to recover the cryptographic key from such implementations. 

The work of McMillion et al. received little attention, and was only recently 
picked up Ranea and Preneel [36]. They analyzed the white-box security of 
substitution-permutation network (SPN) ciphers protected using self-equivalence 
encodings. They proposed a generic attack on such implementations, and proved 
that it is possible to recover the key from self-equivalence implementations of 
traditional SPN ciphers, if the S-box does not have differential and linear approx- 
imations with probability one. As cryptographically strong S-boxes are designed 
to resist differential [6] and linear [32] cryptanalysis, they showed that self- 
equivalence encodings are unsuitable to protect this class of traditional SPN 
ciphers. On the other hand, they also indicated that self-equivalence encodings 
might be of interest to protect ciphers with a better self-equivalence structure. 

One possible class of interesting ciphers are add-rotate-xor (ARX) ciphers, 
whose rounds consist of the three basic operations the name implies: modular 
addition, bitwise rotation, and bitwise XOR. Because ARX ciphers do not rely on 
cryptographically strong S-boxes to provide nonlinearity, they are not susceptible 
to the attack described by Ranea and Preneel. Furthermore, in [37], it was found 
that the n-bit modular addition has a number of self-equivalences exponential 
in n. As a result, ciphers employing the modular addition as their only source 
of nonlinearity are a promising target for research in white-box cryptography 
based on self-equivalence encodings. 


1.1 Contributions 


In this paper, we introduce the first academic method to protect SPECK imple- 
mentations using self-equivalence encodings. Let n be the SPECK word size, and 
m the number of key words, that is, the key size divided by n [3]. We start 
by rewriting the SPECK encryption function Ek as a substitution-permutation 
network (SPN), a composition of affine layers AL and substitution layers SZ, 
with the first and last affine layer having a special structure. To obtain the self- 
equivalence implementation Ep, we apply self-equivalence encodings of SL to 
each of the affine layers. Notably, this design could also be applied to protect 
other ARX ciphers. 

Then, we define the set of linear self-equivalences of SZ as SE (SL) and the 
set of affine self-equivalences of SL as SE 4(SL). Using a result from [37], we 
can determine that SE z(SL) contains 3 x 2?”*? elements and SE 4(SL) contains 
3 x 2?”+8 elements. To encode an affine layer, self-equivalences are randomly 
sampled from SEz(SL) or SE 4(SL). Provided that n is large enough, it would 
be impossible for an attacker to brute force the self-equivalence encodings of an 
encoded affine layer. 

However, we found that it is possible to efficiently recover the linear self- 
equivalence encodings from an encoded affine layer by computing the Gröbner 
basis of a system of equations. An attacker can then easily compute the round 
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keys, external encodings, and the SPECK master key. Additionally, we also ana- 
lyzed the security of affine self-equivalence encodings. We show that an attacker 
can recover the affine self-equivalence encodings from an encoded affine layer up 
to one free variable. Consequently, the attacker only has to try 2+! possible 
configurations to break the white-box SPECK implementation. As m is at most 
4 in practice, only 2° = 32 configurations need to be guessed. 

We tested these attacks using our Python implementation on consumer hard- 
ware. For n = 64, the largest SPECK word size available, we found that it took 
only 16.08 and 42.00s to break the self-equivalence implementations when linear 
and affine self-equivalence encodings were used, respectively. Unfortunately, we 
conclude that these self-equivalence encodings are trivially insecure in the white- 
box model. Still, we hope that our method can be extended using higher-degree 
self-equivalences in the future to produce a secure white-box SPECK implemen- 
tation. 

We also created a Python implementation of our white-box SPECK method, 
capable of generating correct white-box SPECK code. This allows us to compare 
the performance impact of our design to an unprotected SPECK implementation. 
Because this impact is significant, we extend the program with strategies to 
generate more performant code. These strategies improve the execution speed 
of the matrix-vector product, one of the core functions in our implementations, 
and reduce the disk space required to store the binary matrices and vectors. 
We believe that these code generation strategies can be of independent interest 
to improve the performance of other mathematical computations relying on the 
storage of matrices and the computation of a matrix-vector product. In particu- 
lar, these improvements could be applied to self-equivalence implementations of 
other ARX ciphers. 

Finally, we compare an unprotected, reference SPECK implementation, an 
unoptimized white-box SPECK implementation, and the code generated by 
the different code generation strategies for three different SPECK variants: 
SPECK32/64, SPECK64/128, and SPECK128/256. The results show that the bit- 
packed and SIMD code generation strategies provides the most efficient code, 
both in terms of disk space usage and execution time. However, these strategies 
still pale in comparison to the unprotected, reference SPECK implementation, 
which is 5.4 times smaller and 24.8 times faster than the most efficient white- 
box implementations. 

White-box cryptography is a hard problem, and over the years many white- 
box designs have been proposed and broken. While many new designs are based 
on the CEJO framework [20], we attempt to build on the comparatively recent 
method using self-equivalences [33]. Even though the results show our design is 
insecure for SPECK, we hope that this work can still be a useful stepping stone 
in the study of self-equivalence encodings for white-box cryptography. 

A full version of this paper, which includes attack results and performance 
details for additional SPECK configurations, can be found at [39]. 


Outline. In Sect. 2, we define some preliminary notation and concepts that will be 
reused throughout this text. We introduce our approach to apply self-equivalence 
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encodings to SPECK in Sect. 3. Then, in Sect. 4, we will analyze the security of our 
white-box SPECK implementation using linear and affine self-equivalence encod- 
ings. In Sect. 5, we give an overview of our Python project to generate white-box 
SPECK implementations using self-equivalence encodings and a comparison of five 
additional strategies to improve the performance of the generated code. Lastly, 
Sect.6 contains the conclusions and future work. 


2 Preliminaries 


In general, lowercase symbols in this paper refer to numbers and vectors, while 
uppercase symbols are used to denote functions and matrices. In particular, Æ 
and D will be used to denote encryption and decryption functions, respectively. 
On top of this, we use Ey and Dx to refer to encryption and decryption functions 
with a hard coded key, k. 

Finite fields with q elements are written as F}. We will only work with the 
finite field over two elements, Fj. Vectors over this field are called binary vec- 
tors, while matrices over F2 are called binary matrices. More specifically, binary 
vectors in the vector space F% are called n-bit vectors. The addition in Fə is 
denoted using ©, and we extend this to the addition of n-bit vectors by pairwise 
addition of each element. Finally, as a shorthand, we will sometimes replace $ c 
by ®e if c is a constant. 

A function A: F} + F7 is called an (n,m)-bit function. If n = m, then we 
simply call these functions n-bit functions. We use o to refer to the composition 
of functions. 

An important operation in this paper is the modular addition, defined as the 
addition of two numbers x and y, modulo some power of two. We use H to refer 
to the modular addition, and H to refer to its inverse, the modular subtraction. 
Lastly, x >> a denotes a right bitwise circular shift of x by a positions and 
x < p denotes a left bitwise circular shift of x by 8 positions. 


2.1 Self-equivalences 


We briefly introduce the definition of linear and affine self-equivalences, and its 
matrix and matrix-vector forms. 


Definition 1 (Linear self-equivalence, [10]). Let F be an (n,m)-bit func- 
tion. Let A be an n-bit linear permutation and B be an m-bit linear permutation. 
IfF=BoFoA, we call the pair (A, B) a linear self-equivalence of F. 


Because A and B are linear functions, they could be given in the form of n x n 
and m x m matrices, respectively. In that case, we say (A, B) is a linear self- 
equivalence of F in matrix form. 


Definition 2 (Affine self-equivalence, [10]). Let F be an (n, m)-bit function. 
Let A be an n-bit linear permutation, a an n-bit constant, B an m-bit linear 
permutation, and b an m-bit constant. Together, (A,a) and (B,b) describe affine 
permutations. If F = (a o B)o Fo(@,0 A), we call the pair ((A,a),(B,b)) an 
affine self-equivalence of F, or just a self-equivalence of F. 
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Similarly, A, a, B, and b could be given in the form of n x n and m x m matrices, 
and vectors of length n and m, respectively. In that case, we say ((A, a), (B, b)) 
is an (affine) self-equivalence of F in matrix-vector form. Of course, linear self- 
equivalences are also affine self-equivalences, with a and b equal to the zero 
vector. 

In this paper, we will mostly work with the matrix and matrix-vector forms 
of self-equivalences. This allows us to precisely specify the self-equivalences we 
are using, as well as manipulate these matrices and vectors using basic linear 
algebra. 


2.2 Speck 


SPECK is a family of lightweight block ciphers proposed by the National Security 
Agency in 2013 [3]. In particular, SPECK was designed with a focus on perfor- 
mance in software. In this paper, we also use “the SPECK (block) cipher” to refer 
to the general design of the SPECK family. 

The SPECK family consists of ten different instances, depending on the block 
size and key size parameters. The block size refers to the size in bits of the 
input, internal state, and output. These values always consist of two words, x 
and y, with bit size n. The key size refers to the size in bits of the master key 
k, which consists of m key words, with bit size n. We use the block size and key 
size in a shorthand notation to refer to specific SPECK instances. For example, 
SPECK128/256 refers to a SPECK instance with block size 128 and key size 256. 


3 Self-equivalences and Speck 


This section describes how self-equivalences can be used to create a white-box 
implementation of SPECK!. Being an ARX cipher, the SPECK encryption function 
is commonly written as a composition of the basic operations: modular addition, 
bitwise rotation, and bitwise XOR. However, to properly use self-equivalences in 
our design, SPECK needs to be rewritten as a repeated composition of non-linear 
and affine layers, similar to a substitution-permutation network (SPN). In the 
case of SPECK, the non-linear layers will contain the modular addition, and the 
affine layers contain the bitwise rotation, bitwise XOR, and round keys. 

Then, we introduce the definition of an encoding: a permutation applied 
to the start or the end of a function F, to hide the original behavior of F. 
Encodings can be applied to the round functions of a block cipher to create 
encoded implementations, a type of white-box implementations [19]. 

We use a special type of encodings, based on self-equivalences of the modular 
addition, to encode the affine layers of SPECK. The start of the first affine layer 
and the end of the last affine layer are encoded using random permutations, 
called external encodings. When these encodings are applied to all affine layers 


1 Our method will focus on protecting the SPECK encryption function, but this design 
could easily be adapted to the SPECK decryption function. 
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of SPECK, we obtain a self-equivalence implementation, a different type of white- 
box implementations [36]. Note the difference with encoded implementations: 
in an encoded implementation, entire round functions are encoded; in a self- 
equivalence implementation, only the affine layers are encoded. 

Let us start by rewriting the SPECK encryption function as a substitution- 
permutation network (SPN). First, we define the encryption function of an SPN. 


Definition 3 (SPN encryption function). Let Ep be an encryption func- 
tion which takes a plaintext m and encrypts this plaintext using key k to pro- 
duce ciphertext c. Then Ey, represents the encryption function of a substitution- 
permutation network if Ey, can be decomposed in affine layers AL and substitution 
layers SL as follows: 


E, = AL) o SLo---0 AL® o SLo ALY 


In addition, we call SLo AL") an SPN encryption round BO) 


We can now show that the SPECK encryption function can also be written as a 
combination of SPN encryption rounds. Let Ep be the encryption function of the 
SPECK cipher consisting of ny rounds with word size n. Ex, can be decomposed 
into affine layers AL and substitution layers SL: 


E, = AL) o SLo---0 AL® o SLo ALO 


with: 
SL(a,y) = (x By,y), 
AL (x,y) = (£ >> a, y), 
AL (x,y) = ((c BR) > a, (@ ok”) (y K A)), for l Sr <n, —1, 
AL) (x,y) = oe ©), (GOR) @ (y < A) . 


This result is also shown visually in Fig.1. Here, two SPECK rounds are 
shown in sequence, with the dotted lines indicating the affine layers separated 
by modular additions. Evidently, this can be extended to n, SPECK rounds, 
resulting in n, + 1 affine layers, where layer 0 and n, have a special structure. 

In the previous definitions of AL, the SPECK state consists of two n-bit vari- 
ables x and y. However, the self-equivalences of SL are 2n-bit affine permutations, 
which operate on vectors of length 2n with elements in Fə. To be able to apply 
these self-equivalences to AL, we need rewrite AL as 2n-bit affine permutations 
operating on a 2n-bit state vector xy: 


AL = Ra, 
AL™ = Rao X o Lgo Oy, for 1 <r<n,—-1, 
AL) = Xo Lg ODpiny) + 
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ko Bort) 
2”) , Sa FS — Sa S +s p(r+2) 
y™ + 1K B BP + KP P > yo 2) 


Fig. 1. Diagram of two SPECK encryption rounds, with affine layers indicated using 
dotted lines. 


Here, xy contains the bits of x and y in little-endian order, Ra represents a right 
circular shift of x by a bits, Lg represents a left circular shift of y by @ bits, and 
X represents the bitwise XOR. operation such that y = x @ y. Finally, k’) is a 
vector of length 2n containing the key bits of the round key k”) in the first n 
positions and zero in the last n positions. 

To protect the key material in AL”, we need to encode the affine layers. Let 
us first introduce the definitions of an encoding. 


Definition 4 (Encoding, [36]). Let F be an (n, m)-bit function and let (I, O) 
be a pair of n-bit and m-bit permutations, respectively. The function F = Oo FoI 
is called an encoded F, and I and O are called the input and output encoding, 
respectively. 


In our design, the encodings J and O will mainly be self-equivalences of 
SL when an affine layer is encoded. Therefore, we call these encodings self- 
equivalence encodings. However, the input encoding of the first affine layer and 
the output encoding of the last affine layer must be random affine permutations, 
called the external encodings. It is critical to the security of white-box implemen- 
tations that these external encodings are generated at random and kept secret 
from the attacker. Without external encodings, our design would be trivially 
insecure [19]. Now, we define the encoded affine layers. 


Definition 5 (Encoded affine layer, [36]). Let AL‘) be an affine layer of 


the SPECK cipher, with 1 < r < np. Then we call AL) an encoded affine layer, 
with: 
AL™ = (Boc) © ol”) o AL™ o(@yir) © 1), 


where ((O"), o), (7+) i@+))) is a self-equivalence of the SPECK substitution 
layer SL, and (IY ,i) and (OM), 0) are random affine permutations. 


Note that AL) will not be encoded: this affine layer does not contain any key 
material, so it can be skipped. 

If the self-equivalences composed with each AL") are sampled randomly from 
a set of self-equivalences, the unencoded affine layer AL) can not be recovered 
without knowledge of (I), i0) and (O™, 0). This effectively hides the round 
keys inside the affine layers, and is the basis of our method to protect SPECK 
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implementations using self-equivalence encodings. Moreover, this process could 
easily be adapted to other ARX ciphers. When all affine layers of a SPECK 
encryption function are encoded using self-equivalence encodings and external 
encodings, we obtain a self-equivalence implementation of SPECK. 


Definition 6 (Self-equivalence implementation, [36]). Let Ep be the 
encryption function of the SPECK cipher consisting of ny rounds with word size 
n. We call Ex, a self-equivalence implementation of SPECK, with: 


Ex = AL™) o SLo---0 AL™ o SLo AL) 


We can show that a self-equivalence implementation of SPECK, Ex, is func- 
tionally equivalent to E,, up to the external encodings. Due to the self- 
equivalence property, the intermediate encodings are canceled: 

Ek = AL”) o SLo---o AL o SLo AL) 
= (Boin) 00") 0 AL) o SLo---0 ALY 0(@,a) o IM) o SLo AL 
= (Pon) 00") o AL”) o SLo---0 ALY o SLo AL o(@yay o I!) 
= (Boir) o or) o Ekgo (Bro o r®) F 


This property is also illustrated in Fig. 2, for two encryption rounds. The dashed 
lines indicate the substitution layer SL surrounded by its self-equivalence, which 
can simply be reduced to SL. The encoded affine layers AL™ and AL*” are 
marked by dotted lines. 


E : - I : 
: + 1 > : 1 Lo A 
ay?) yr) ran ALO H om a ; H o+ Š H ALD owe) ‘£P $ > ry" +2) 


Fig. 2. Diagram of two SPECK SPN encryption rounds encoded using self-equivalences. 


3.1 Self-equivalences of SL 


We use the method described in [37] to generate the self-equivalences of SL. This 
allows us to randomly sample both linear and affine self-equivalences. For more 
information, we refer the reader to Sect. 5.2 of [37]. 

We call the sets of linear and affine self-equivalences generated using this 
method SE;(SL) and SE 4(SL), respectively. When n > 2, |SEz,(SL)| = 
3 x 2?"*2 and | SE 4(SL)| = 3 x 2?"*8. This is important for the security of 
our method to protect SPECK implementations: the number of self-equivalences 
should be as high as possible to prevent a simple brute-force key-recovery attack. 
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For n = 64, the largest SPECK word size, this would result in 3 x 213° and 3 x 2136 
possibilities for linear and affine self-equivalences, respectively, enough to resist 
a naive brute-force attack. In the next section, we introduce a more extensive 
security analysis of self-equivalence encodings and show an attacker can still 
recover self-equivalences without resorting to brute-force. 


4 Security Analysis 


This section analyzes the security of our white-box SPECK design. The security of 
white-box implementations can be expressed in many different ways. Most com- 
monly, the goal of the attacker is to extract the cryptographic key from a pro- 
vided implementation (key extraction). However, other security notions include 
one-wayness and incompressibility. A detailed analysis of white-box cryptogra- 
phy security goals is presented by Bock et al. in [11]. In this paper, we focus on 
the fundamental white-box security feature: resistance to key-recovery attacks. 

In our analysis, we will evaluate the security of our white-box SPECK method 
from an algebraic perspective. Although self-equivalence encodings are generated 
at random, they are not completely random linear or affine transformations. We 
will try to exploit the additional structure of SE;,(SZ) and SE 4(SL) to reduce 
the brute-force search space of possible self-equivalence encodings and recover 
key bits. Moreover, to fully compromise the security, we will also need to recover 
the external encodings from the white-box implementation. Unlike the attack 
introduced by Ranea and Preneel in [36], which is based on equivalence problems 
and not applicable to SPECK, we analyze self-equivalence equations in bits. In 
the broader context of the white-box model, our approach is quite simple: we 
only require access to the encoded affine layers of the implementation. 

To perform a key-recovery attack on the white-box SPECK implementation, 
we need to recover the master key k from the self-equivalence implementation 
Ep. Unfortunately, the self-equivalence implementation only contains protected 
versions of the round keys, k). As a result, recovering k directly is not possible, 
so computing k using some recovered k‘") is a crucial part of a successful key- 
recovery attack. Luckily, the SPECK key schedule is invertible, and k can be 
computed easily, using only the m first round keys. Let n be the SPECK word 
size, and m the number of key words, that is, the key size divided by n [3]. 
Suppose k,...,40™ are known, then compute: 


rtm) — (KO << B) p kEtD 
1) = (C+) er) B kO) <a. 


Combining 10"-),...,1, and k@®, we obtain the master key k. 

Note that this approach can be extended to reconstruct k using any sequence 
of m consecutive round keys, by working backwards to compute the preceding 
k™) and 1 values. 
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4.1 Security Analysis of Linear Self-equivalences 


We start the analysis of the white-box method for SPECK by looking at a variant 
where all encodings, both self-equivalence encodings and external encodings, are 
linear. Although linear encodings are significantly weaker than affine encodings in 
terms of security, they are also conceptually easier to understand. Furthermore, 
the analysis of this weaker version might give us some initial insights in the 
security of a more secure variant using affine encodings. 

In this section, we will focus on a single intermediate affine layer of an encoded 
SPECK encryption function E;. For the sake of convenience, we repeat the defi- 
nition of an encoded affine layer for round r (see Definition 5) here: 


AL” = (Bom 00) o ALM 0 Bim) 01) 3 


Because we only consider linear encodings for now, i) and o) are zero vectors. 
Consequently, Eq. (1) can be simplified to: 


AL™ = 0 o ALO of”) (2) 


This encoded affine layer will be stored as a combination of an encoded matrix 
M) and an encoded vector v”): 


MO =OMOMMIM, for l<r< ny, 


vw”) =OMy) for l<r<nyp . 


For each round r, M") represents the known linear operations of the affine layer, 
while vf”) is the constant of the affine layer. However, as v) does not contain 
any key material, this round is not protected using self-equivalences. 

To hide the key material in v, (O0), I°+)) need to be randomly generated 
linear self-equivalences of SL. If the self-equivalences are generated using the 
method from [37], then SEz(SL) can be parameterized by a bit vector c of 
length 2n + 5, where n is the SPECK word size. We do not describe the full 
parametrization for SE (SL) here, instead, it can be found in the Python project 
code. For any encoded matrix M), c"-)) and c) fully define I and OW), 
respectively. In other words, if it is possible to recover these bit vectors, an 
attacker can re-generate I‘) and OC), peel off the self-equivalence encodings, 
and compute the round keys and external encodings. 

We will now describe a method to recover c7» and c) for any intermediate 
round r. Let X and Y be the matrix forms of the unknown self-equivalence 
encodings, I") and O0, respectively. Combining this with the definition of 
M), we obtain the following equation: 


MC) =YMMX (3) 
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As I‘) and OC) are generated using the method from [37], each entry in the 
matrices X and Y is parameterized by c"—) and c) respectively. Furthermore, 
MC) and MC) are the 2n x 2n matrices known to the attacker. By looking at 
each entry of these matrices individually, Eq. (3) can be written as a system of 
(2n)? equations in 2 x (2n + 5) unknowns, the bits in c"—) and c0). Let a; 
be the unknowns corresponding to X and p; the unknowns corresponding to Y. 
Now, let R be the Boolean polynomial ring in these variables, that is: 


R = Folai, bi] / (a? + ai, B? + bi), forl<i<2n4+5. 


Of course, depending on the density of X, Y, and M), many of these equations 
might not include any a; or ĝ; variables. However, by computing the Gröbner 
basis G of the ideal defined by these equations in R, it is possible to uniquely 
determine the values of a; and (;. We verified this experimentally for every 
SPECK word size. This in turn reveals the values of c70 and c0). 

An attacker can use this method to recover c) for M+) and 1<r<m, 
where m is the number of SPECK key words. The attacker then re-generates 
the self-equivalences (O, I("+1)), Because v) is always publicly known, the 
attacker can compute 


(0%) -tylr) = (0%) oy) 


= yp) 


to obtain the round keys k“) for r = 1,2,...,m. 
Similarly, an attacker can recover c® from M() and re-generate the self- 
equivalence (OM), I). As with v™, M® is always publicly known, so the 


attacker can compute 


@)yrQy-1lyy) = (9M Mto yO ra) 
(OS MYM (OV MS) CON MYT 
= jf) 


to obtain the input external encoding 1. 
Finally, an attacker can recover cr} from M(™r-1) and re-generate the 
self-equivalence (O(r-)), [(r)). The attacker then computes 


MOr)( Mr) [mr))—4 = OCA Mr) FOr) (MO) 
_ Orr) 


to obtain the output external encoding O'”). 

Note that it is not possible to recover c) from M@ or c=) from M(nr), 
Because M“) and Mr) are multiplied by respectively J“ and O°”), random 
affine permutations, Eq. (3) does not hold. 

The most expensive operation in this attack is computing the Grobner basis 
for 4n? equations in 2 x (2n + 5) variables. Unfortunately, it is notoriously dif- 
ficult to estimate the time complexity required to compute the Gröbner basis. 
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Instead, we implemented this attack in Python [1] using SageMath [38] and the 
POLyYBoR! framework [17]. We executed this implementation using a single core 
on a laptop with an AMD Ryzen 7 PRO 3700U CPU, running Linux 5.15.5. The 
attack took only 16.08s to recover the master key and external encodings from 
a white-box SPECK128/256 instance. The full results for every SPECK instance 
can be found in [39], Appendix A. 

Clearly, it is feasible to execute this attack using even modest consumer 
hardware. We conclude that a white-box SPECK implementation using only linear 
encodings is insecure against key-recovery attacks, even with relatively limited 
capabilities. In particular, it is not necessary to inspect or modify the execution of 
the white-box implementation. Furthermore, recovering the encodings is possible 
using only the information revealed by a single encoded affine layer. 


4.2 Security Analysis of Affine Self-equivalences 


Knowing that a white-box SPECK implementation using only linear encodings is 
insecure, we can try to extend this attack to the full design using affine encodings. 
We start by updating the equations for M(") and v”) with affine self-equivalence 
encodings (I, i") and (00, 0”): 


MO) = OO MOIO, for 1 <r <n, 
v) = OM) (uy & M r)i) Go”), forl<r<n,. 


Once again, we will try to recover the coefficients used to generate a ran- 
dom affine self-equivalence ((O"), 0), (I+ ,i"+))). In this case, if the self- 
equivalences are generated using the method from [87], then SEA(SL) can be 
parameterized by a bit vector c of length 2n + 11, where SPECK n is the word 


size. We previously showed that c) can easily be recovered from M") when 
only linear encodings are used. However, we found that some coefficients are 
exclusively used in the constants i”) and o). As a result, we also need to use 


the definition of v(") to recover the full value of c"). Furthermore, to simplify 
our implementation, we will simultaneously recover the bit vector k), the round 
key bits for round r. 

Instead of uniquely determining c71, c), and k) for a round r, we will 
describe a method to generate possible configurations for these coefficients and 
key bits. Let (X, x) and (Y,y) be the matrix-vector forms of the unknown self- 
equivalence encodings, (I‘), i) and (O0, o0), respectively. First, we apply 


the definition of M") again to obtain (2n)? equations, similar to the first step in 
the attack on linear self-equivalences (Eq. (3)). Let a; be the unknowns corre- 
sponding to (X, x), 3; the unknowns corresponding to (Y, y), and R,; the Boolean 
polynomial ring in these variables, that is: 


Rı = Fo[a;, bil / (a? + ai, 8? + bi), for 1 <i <2n+11. 
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Computing the Grobner basis G; of the ideal defined by these equations in 
Rı reveals the values of 4n + 15 unknowns, slightly less than the total number, 
2 x (2n + 11). Now, let z represent the vector uv‘), unknown to the attacker. 
Combining this with (X, x) and (Y,y) and the definition of v€), we obtain the 
following equation: 


vO =¥(z@oMz) oy (4) 


In this case, x is parametrized by c7", whereas Y and y are parametrized 
by c"). Furthermore, recall that v contains k), the n unknown round key 
bits. As MC) and v) are known to the attacker, Eq. (4) can be written as a 
system of 2n equations in 2 x (2n+ 11) + n unknowns. As before, let a; be the 
unknowns corresponding to (X,x) and £; the unknowns corresponding to (Y, y). 
We now also introduce yj to denote the unknowns corresponding to z. Let R2 
the Boolean polynomial ring in these variables, that is: 


Rə = Rilyl/ yj +5); forl<j<n. 


However, because the values of 4n+15 unknowns were revealed by G1, we also 
define the quotient ring Q = R2/G1. Finally, we again compute the Grobner basis 
Gz of the ideal defined by the equations of v() in Q. This uniquely determines 
the values of all but one of the unknowns, resulting in two possible configurations 
for c7), c0), and kO). 

An attacker can then follow the same process described in Sect. 4.1 to recover 
the possible round keys and external encodings J“) and O®™). In total, 2+! 
possible configurations must be enumerated, with m the number of key words. 
Because m is at most 4 for SPECK, this exponential function is no problem in 
practice. We implemented this attack in Python [1] using SageMath [38] and 
PoLYBORI [17] and executed it using the same setup used for linear encodings. 
Now the attack took 42.00s to recover the master key and external encodings 
from a white-box SPECK128/256 instance. Again, the full results can be found 
in [39], Appendix A. 

Although this attack is certainly more expensive than the one for linear 
encodings, the master key and external encodings are still easily extracted in 
practice. Consequently, we must conclude that our white-box SPECK method is 
insecure in the white-box model. However, a higher level of security might be 
achieved by using quadratic, cubic, and even quartic self-equivalences. 


5 Implementation 


In previous sections, we discussed the theoretical foundations of our method to 
construct white-box SPECK implementations. To research the practical viability 
of this method, we also implemented a program to generate white-box SPECK 
code. This project is publicly available in our GitHub repository”. Our program 


? https: //github.com/jvdsn/white-box-speck. 
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to generate white-box SPECK implementations? is written in Python, a free and 
open source programming language [1]. We chose Python because its source 
code is completely portable across platforms, programming in Python is com- 
paratively simple, and it is possible to interact with SageMath using a language 
interface [38]. SageMath is a free and open source mathematics package, which 
is used extensively for mathematical computations throughout the project. 

Our program generates white-box SPECK implementations in four major 
steps. First, the program takes the block size 2n and master key k as input. 
Using the SPECK key schedule, k is transformed into round keys k, and MO? 
and v‘") (see Sect. 4.1) are computed. Then, for each round r, random self- 
equivalence encodings are generated to encode MC) and v). In this step, the 
random external encodings are also generated and applied to round 1 and ny. 
Finally, using M") and v"), the program generates the output code, a white-box 
SPECK implementation and its inverse external encodings. 

Currently, this output code is exclusively C source code. We chose the C 
programming language because it is widely used, provides fast low-level memory 
control, and contains a convenient interface for single instruction, multiple data 
(SIMD) functions. However, our project could easily be adapted to return source 
or compiled code for other programming languages. 

The generated C code follows the same intuitive pattern as simple SPN cipher 
implementations. For each round, a modular addition and affine transformation 
are performed, except for the final round, which consists only of the affine trans- 
formation. Because the white-box SPECK encryption algorithm operates on vec- 
tors of bits instead of integers, the input x and y has to be converted to bits 
first. Similarly, the state vector xy has to be converted back to integers after 
encryption. No key expansion is necessary, as the round keys k) are encoded 
in the affine layers. 

The encryption function relies on five subroutines: functions to convert to 
and from bits, a function to perform the modular addition on zy, a function 
to perform the matrix-vector product, and a function to perform the vector 
addition. Conversion to and from binary is done big-endian. The other three 
functions use a standard textbook implementation. For example, the modular 
addition simply performs the addition with carry algorithm on each individual 
bit, ignoring the final carry to perform the modular reduction. In the case of the 
matrix-vector product, two for loops are used to compute the resulting vector. 
For the vector addition, the generated code performs an XOR operation for each 
bit in the vector. We call this the default code generation strategy; in the next 
section we will consider techniques to implement these functions more efficiently. 

Finally, apart from the definitions and implementations of these subroutines, 
the required data (matrices M(") and vectors v(")) will also have to be stored 
in the C source code. A straightforward way of storing a matrix in C is to use 


3 Our project currently only supports the generation of white-box SPECK encryption 
code. However, the existing project could easily be modified to also generate white- 
box SPECK decryption implementations. When discussing the generated code in this 
section, we always refer to SPECK encryption. 
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a two-dimensional array: storing each row as an array of the elements in an 
enclosing array to represent the full matrix. A vector can be stored by simply 
using a single one-dimensional array. In total, n, + 1 two-dimensional arrays 
and nr + 1 one-dimensional arrays are generated by the default code generation 
strategy. 


5.1 Code Generation Strategies 


Although the method described previously generates correct and functional C 
code, this code is far from optimal. We introduce five additional code generation 
strategies to improve the efficiency of the generated C code. 


Sparse matrix code generation Because the entries of M(”) are in F2, one 
could consider storing only the nonzero entries to save disk space. The other 
entries are then implicitly known to be 0. We call this the sparse matrix 
representation. In addition to reducing the disk space used by the generated 
C code, using the sparse matrix representation also simplifies the matrix- 
vector product. Similar to the sparse matrix representation, we can also use 


a sparse vector representation for the vectors v‘"). The vector addition can 
also be modified to take advantage of the sparse vector representation. 

Inlined code generation Before the C code is generated, the contents of M() 
and v”) are already known. Therefore, it is possible to generate n,.+1 different 
functions for the matrix-vector product and for the vector addition. In the 
case of the matrix-vector products, these functions will only contain the array 
operations for the nonzero entries in the matrix. Similarly, the functions for 
the vector additions only modify the positions for the nonzero entries in the 
vector. In this way, the data is inlined in the function implementations. 

Bit-packed code generation The C standard library contains data types to 
store 16-bit, 32-bit, and 64-bit unsigned integers. Instead of storing the bits 
individually in an integer data type, we can use these larger data types to 
store multiple bits simultaneously, bit-packing n bits in an n-bit unsigned 
integer. This will considerably reduce the disk space usage and improve the 
execution time of the generated C code. When n = 24 or n = 48, the data 
must be stored in 32-bit or 64-bit unsigned integers, respectively. 

Inlined bit-packed code generation This code generation strategy combines 
the previous two strategies. n bits are bit-packed in an n-bit unsigned integer, 
and used in n, + 1 different functions for the matrix-vector product and for 
the vector addition. Compared to the inlined strategy, this method has the 
advantage of the state vector xy being bit-packed. Compared to the bit- 
packing method, we might expect a performance improvement as a result of 
the loop unrolling in the inlined functions. 

SIMD code generation We extend the bit-packed code generation with 
instructions from the Advanced Vector Extensions (AVX) and Advanced Vec- 
tor Extensions 2 (AVX2) instruction sets. Single instruction, multiple data 
(SIMD) allows algorithms to operate on multiple pieces of data, called vec- 
tors, at the same time. For example, sixteen 16-bit integers could be combined 
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into a 256-bit SIMD vector, which could then be manipulated using SIMD 
instructions. For the sake of simplicity, our implementation does not consider 
n = 24 and n = 32. 


5.2 Comparison 


To provide a comprehensive comparison of the SPECK encryption performance 
for the unprotected and self-equivalence implementations, we tested three dif- 
ferent variants: SPECK32/64, SPECK64/128, and SPECK128/256. We did not 
test the block sizes 48 and 96, as these parameters are not supported by all 
code generation strategies. For every variant, we used the keys from the original 
SPECK test vectors to perform the encryptions [3]. However, the choice and length 
of key should not have an impact on the performance of the self-equivalence 
implementations. Furthermore, to ensure a fair comparison, the same affine self- 
equivalence encodings were used when generating C code using different strate- 
gies. We give an overview of the results for each of the three variants, the full 
details of the experiments can be found in [39], Appendix B. 

In the case of SPECK32/64, the unprotected reference implementation takes 
up 16 320 bytes of disk space, with the smallest self-equivalence implementation, 
the bit-packed implementation, using only 19 552 bytes of disk space. To compare 
the performance, 1000000 random encryptions were performed upon execution 
of the program. On average, the unprotected implementation finished this in 
0.22s at 4.0GHz, reaching a throughput of 220 cycles per byte (c/b). The most 
efficient self-equivalence implementation, again the bit-packed implementation, 
is considerably slower, taking on average 2.26s, which results in a throughput of 
2260 c/b. 

Because unprotected implementations do not store matrices and vectors 
which depend on the block size, the required disk space for the unprotected 
SPECK64/128 implementation stays the same. The smallest self-equivalence 
implementations are the SIMD and bit-packed implementations, using 31072 
and 31 080 bytes respectively. For a block size of 64 bits, 300 000 encryptions were 
executed, which results in an average execution time of 0.08 s for the unprotected 
implementation. This is equivalent to a throughput of 133 c/b. Here, the SIMD 
strategy also produces the fastest code (1.31s, 2183 c/b), however bit-packed 
code is only slightly behind (1.51s, 2517 c/b). 

Finally, for SPECK128/256 implementations, the sparse matrix representa- 
tion requires a similar amount of disk space on average (95 345.6 bytes) compared 
to the bit-packed (88 760 bytes) or SIMD (88 752 bytes) implementations. While 
code generated using these strategies still takes up six times the amount of disk 
space of an unprotected implementation, it still improves on the default self- 
equivalence implementation with a reduction of 85%. For this block size, the 
number of random encryption iterations was set to 100000. The experimental 
results show an average throughput of 125 c/b for the unprotected implemen- 
tation, while the bit-packed code is the most performant self-equivalence imple- 
mentation, reaching a throughput of 2825 c/b on average. 
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6 Conclusion 


In this work, we introduced the first academic method to protect white-box 
SPECK implementations using self-equivalence encodings. We showed that these 
encodings can be applied to SPECK rounds, ostensibly hiding the round keys 
in encoded affine layers. Similar techniques could be used to protect other ARX 
ciphers, such as SALSA20 [5], CHACHA20 [4], or THREEFISH [25]. We also analyzed 
the security of our design against key-recovery attacks. We presented practical 
attacks to fully recover the self-equivalence encodings and external encodings of 
a self-equivalence implementation, showing that our method is completely inse- 
cure in the white-box model. Finally, we created a Python project to generate 
self-equivalence implementations using our method. We used this project to cal- 
culate the impact of our method on the performance of SPECK. Furthermore, we 
were able to compare five additional strategies to generate output C code, and 
determined an overall optimal strategy: bit-packed code generation. 

One possible area for future research is the generation of self-equivalences. 
In this paper, we only employed linear and affine self-equivalences. Extending 
this design to quadratic, cubic, or higher-degree self-equivalences could result in 
more secure white-box implementations. Alternatively, the security of our cur- 
rent method could be analyzed using several approaches in the white-box model. 
In particular, we did not consider the known techniques based on side-channel 
analysis, such as differential fault analysis [7] and differential computation anal- 
ysis [12]. Our Python project could be used to test and compare the efficiency of 
several attacks on white-box implementations using self-equivalence encodings. 
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Abstract. Since the differential-linear cryptanalysis was introduced by 
Langford and Hellman in 1994, there have been many works inherit- 
ing and developing this technique. It has been used to attack numer- 
ous ciphers, and in particular, sets the record for Serpent, ICEPOLE, 
Chaskey, 8-round AES, and so on. In CRYPTO 2020, Beierle et al. 
showed that the data complexity of differential-linear attack can be sig- 
nificantly reduced by generating enough right pairs artificially. In this 
paper, we manage to find the property in the differential propagation 
of modular addition. Based on this, we can select special bits to flip to 
produce right pairs in a certain differential-linear attack. For application, 
we focus on the differential-linear attack of the ARX cipher Speck32/64. 
With the differential-linear trail we concatenate, we construct 9-round 
and 10-round distinguishers with the correlation of 21458 and 21458, 
respectively. Then we use enough flipped bits to reduce the complexity 
of the key recovery attack. As a result, we can use only 2”° chosen plain- 
texts to attack 14-round Speck32/64 with the time complexity of about 
2°? which has a slight improvement than before. To our best knowledge, 
this is the first differential-linear attack of the Speck family. 


Keywords: Differential-linear cryptanalysis + ARX + Speck32/64 


1 Introduction 


ARX ciphers are one popular category in symmetric key primitives. ARX is 
short for three operations: addition modulo 2”, word-wise rotation and XOR. 
There are many famous examples of ARX-based designs such as block ciphers 
FEAL [29], Speck [6], stream ciphers Salsa20 [10], ChaCha [9], MAC algorithms 
Chaskey [27], Siphash [3] and hash functions Skein [19], BLAKE [4]. Due to 
the simple composition of the design, ARX constructions are quite friendly for 
software implementation. With the extensive design and use of ARX ciphers, 
their security analysis has caught cryptanalysts’ attention in recent years. 

When it comes to the security of ARX ciphers, the two best known tech- 
niques in symmetric cryptography—differential cryptanalysis [11] and linear 
© Springer Nature Switzerland AG 2022 
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cryptanalysis [26] have to be taken into consideration. In previous works, the 
differential and linear properties of the only non-linear operation—modular addi- 
tion in ARX ciphers had been well understood [23,31]. 

However, because of the mutual promotion between designers and crypt- 
analysts, like the wide-trail strategy [16] in AES, there also exists a long-trail 
strategy [17] in designing ARX-based ciphers. Because of this, the pure differ- 
ential attack or linear attack fails to achieve outstanding results in many cases 
when analyzing long rounds of ARX ciphers. Luckily, for this case, differential- 
linear cryptanalysis, which was introduced by Langford and Hellman [21] in 
1994, shows its advantage. 

Informally, the differential-linear analysis combines a differential character- 
istic with probability p for the first s rounds and a linear characteristic with 
correlation q for the next t rounds, resulting in a differential-linear characteristic 
covering r+t rounds with correlation pq? and an attack with the data complexity 
of roughly p~?q7+. 

From several recent works [15,24], we have seen the potential of differential- 
linear attack in ARX ciphers. One of the most prominent results is the improved 
differential-linear framework on ARX primitives presented by Beierle et al. [7], 
who cleverly observe that the complexity of a differential-linear attack can be 
reduced immediately, if the attacker could construct enough right pairs for the 
differential part. 

Inspired by their work, we study the differential properties of the modular 
addition, and find that the differential propagation has a decreasing nature under 
certain restrictions. Based on this unique property, we can determine the special 
bits in the specified differential characteristic and using them to generate enough 
right pairs for free. As a demonstration, we apply this method to attack round- 
reduced Speck32/64. 


Related Works. Among many ARX-based ciphers, the Speck family is one of 
the most attractive primitives for researchers. Since it was published in 2013, 
it has received numerous analyses from the cryptographic community. Due to 
the limitation of space, we only list those representative works here. Abed et 
al. [1] presented the results of the Speck family against the differential attack 
and rectangle attack. At the same time, Biryuov et al. [12] applied another 
advanced search strategy to find better differential characteristics than before. 
In [18], Dinur proposed a sub-cipher attack and carried out the key recovery 
attack based on the best differential by that time. In [32], Yao et al. provided 
the results of all variants of Speck against the linear attack. In [25], Liu et al. 
analyzed the security of Speck against the rotational cryptanalysis with xored 
constants proposed in [2]. Most recently, as a suitable target instance, many 
works based on machine learning have been applied to the Speck family [8,20]. 
Even though there have been so many works related, the differential-linear 
attack has not been used to evaluate the security of the Speck family so far. 


Our Contributions. In this paper, we first study the differential propagation 
in the modular addition. By extending the definition of the xor differential to the 
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bit level and combining a simple proposition of differential weight, we observe 
that there is a monotonous property in the differential propagation. 

Then we show that we can improve the data complexity of a certain 
differential-linear attack by using those bits which we call valid flipped bits. 
As proof of this, we present one concrete example and one generic example in 
the differential of modular addition with 8-bit inputs and outputs. 

Next, we introduce our new differential-linear attack on round-reduced Speck 
32/64. With regard to the differential-linear characteristic, we apply a restrictive 
search strategy and obtain a 7-round optimal trail as a result. Combined with 
the short differential, we can construct effective distinguishers on 9-round and 
10-round Speck32/64. We find 21 and 24 valid flipped bits for them, respectively. 

Finally, with the above improvements and a trivial 1-round filter, we can 
mount 13-round and 14-round key recovery attacks with improved data com- 
plexity and improved time complexity. For comparison, we summarize our attack 
and some other results in Table 1.1 


Table 1. Summary of attacks on Speck32/64 


Rounds Attacked / | Time | Data | Memory Type of Attack 
11 Qe: ioe? - Neural Differential [20] 
11 946.68 | 930.07 | 937.1 Rectangle [1] 
12 990.22 930:8 - Linear [32] 
13 oe? I2 322 Differential [18] 
13 39 || 924 °° Differential-linear(Sec 4.4) 
14 DPS es oe Differential [18] 
14 DRETI POA] 92 Differential [30] 
14 952 | 925 329 Differential-linear(Sec 4.4) 


Organization of the Paper. The rest of the paper is organized as follows: 
Sect. 2 provides the description of Speck and recalls some relevant definitions and 
propositions on the differential properties of ARX ciphers. Section 3 revisits the 
differential-linear cryptanalysis and explains our improvement on it. Section 4 
describes our new results on round-reduced Speck32/64. Section5 draws the 
conclusion finally. 


2 Preliminaries 


For any x € F}, we denote by x [i] the i-th bit of x, where x [n — 1] represents 

the most significant bit(M SB). And we use z[0 : n] to denote the n-bit string 

x. Thus we can obtain that « = 0") 2[i]2'. Further, a n-bit string with few 1 

1 Source code about our experiment can be found in https://github.com/regnik/ 
speck_dl_analysis. 


Improved Differential-Linear Attack to Speck32/64 795 


is simply denoted as x {i, j,k} according to the location of bit 1 in a. We denote 
by ~g the bit-wise inverse of value x. Furthermore, we denote by È the bit-wise 
xor, + the addition modulo 2” and >> the word-wise rotation. We also denote 
the correlation of a random variable X as Cor(X) =2-Pr[X =0]-1. 


2.1 Description of Speck 


Speck is a family of block ciphers designed by the U.S. National Security Agency 
in 2013. It has 10 versions due to different block size 2n and key size mn. For 
example, Speck32/64 refers to the variant of the Speck family with block size 32 
bits and key size 64 bits. 

The Speck family takes two n-bit words as input and outputs two n-bit words 
after a sequence of T rounds. 

The round function is defined as 


va = ((xi => a) + Yi) ® ki 
Yi+ı = (yi K p) Ova 


The key schedule of Speck generates successive round keys by the same func- 
tion as the round function from the master key, which is defined as 


letm—1 = (ki + (li S> a)) Di 
Kiva = (ki KK B) ($>) ligii 
The rotation constants (a, 3) is (7,2) for Speck32 and (8,3) for the other vari- 


ants. For intuitiveness, Fig. 1 describes the round function and key schedule of 
Speck. For more details, one can refer to [6]. 


Xi+1 Yii 


Fig. 1. Left: The Round Function of Speck. Right: The Key Schedule of Speck. R’ is 
the round function where i acts as k[i]. 
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2.2 The Differential Properties of ARX Ciphers 


For the differential properties of ARX ciphers, we just need to pay attention 
to the only non-linear operation addition modulo 2”. We follow the notions in 
the previous work[13,23]. We denote by xdp™ the XOR-differential probability 
of addition modulo 2”. 


Definition 1. (xdp* [13]) Let a, 8, y be fixed n-bit differences. Then the XOR- 
differential probability of addition modulo 2”, denoted by xdpt (a, 8 +— 4), 
where a,(@ are input differences and y is output difference, is defined as: 


adp*(a,8— 7) =27™ . #{(a,y) : (8a) + (y 8 8)) S (£ +y) =}. 


Proposition 1. (weight of (a, B8 — y) [23]) Let a, 8, y be fixed n-bit differ- 
ences. The weight of the differential (a, —> y) denoted by wa, 6 —> 7) can 
be computed as: 


w(a, B — 7) = —loge(xdpt) = h(~eq(a, 6 — 7)), 


where eq(x, y, z) = (ae ®y)^ (7r z) and h(x) denotes the numbers of non-zero 
bit in x except the MSB. 


If we extend the Definition 1 at a bit level, more precisely, we limit the 
two input differences to a single active bit, with the aid of Proposition 1, let 
a = 2,68 = 2, y = 0, then we can obtain dpt (2',2* +> 0) = 1/2; let a = 
2i B = 2*,y = 2+1 then we can obtain rdp*(2*,2¢ > 2+1) = 1/22; let 
a = 2$, 8 = 2, y = 2'*142'*? then we can obtain rdpt (2$, 2t > 2+1 421+2) = 
1/23... Based on this, we can observe the following property: 


Proposition 2. (Monotonicity of xdpt[i]) Let a {i}, B{i} be fixed n-bit dif- 
ferences with only one non-zero bit located at the i-th LSB of a, 8. And Ø 
denotes no difference. Then the probability with which the differential bit afi] 
and [i] propagate to the output differential bit of y is monotonically decreasing 
following the direction 7 to the second MSB: 


xdpt [{i} , {i} -> Ø] > adp*[{i}, {i} — {i +1} >- > adpt[{i}, {if — 
{i+1,i +2,- ,n— 2y 

xdpt Hi}, Ø — {i}] > zdp Hi}, Ø — {i i +1} >--- > gzdpt Hi}, Ø — 
{i,i+1,i +2,- ,n— 2}] 


3 The Differential-Linear Attack 


3.1 Langford and Hellman’s Differential-Linear Cryptanalysis 
Revisited 


Let E be a cipher which we can divide into two sub parts Æ and Es such that 
E = E} o E. The attacker applies a differential trail 6; LAR Ôm and a linear 
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approximation Im al > and then explores the particular link between the 
ciphertexts. 

Specifically, suppose that the differential trail holds with probability p, that 
is to say, 


Further, suppose that the linear approximation has a correlation q, that is to 
say, 
Cor [Int OI, +: Eo(x)] = q. 
Under the assumption that E(x) and E (x) are independent, we have 
Cor |I, - E(x) ® T, - E(x © 6;)] = pq’. 


Thus, if p,q are large enough, the attacker can distinguish the cipher E from a 
random permutation by preparing O(p~2q~*) chosen plaintexts. 


3.2 Differential-Linear Cryptanalysis with Experimental Middle 
Part 


Note that the above analysis is heuristic due to the assumption of the inde- 
pendence of the sub parts. However, the sub parts and the whole cipher are not 
independent in some cases. To estimate the complexity of the analysis more accu- 
rate, the experimental middle part is naturally added. Consequently, the cipher 
E is divided into three sub parts E,,E2 and Em such that E = E o Em o Ei. 
Here, the middle part Em should be experimentally evaluated. Suppose we have 


the middle differential-linear approximation ôm Ea Im with correlation r, that 
is to say, 
Cor [In + Em(£) ® Im: E(x @ ôm)| = r. 
Then, we have 
Cor |Im + Em(E1(£)) D Im: Em(E1(x @ 6;))] = pr. 
Finally, we can obtain that 
Cor |T, : E(x) ® T, - E(x © 6;)] = prg. 


For a formal explanation of the middle differential-linear trail, one can refer 
to a recent work which proposed the Differential-Linear Connectivity Table 
(DLCT) [5]. 

In this paper, we apply the latter of the above two structures. (see Fig. 2) 
Specially, we adopt the similar strategy in [7,22] to determine the middle part. 
We limit us to search for the differential-linear characteristics with one active 
input bit and one active output bit firstly. Then under the certain active bit, 
we search for the differential characteristics before and the linear characteristics 
after, respectively. And it is natural to concatenate these three part according 
to this structure finally. 


2 In some literature, the linear approximation is measured by bias. Note that for a 
specific linear approximation, the correlation is twice the bias. And the analysis holds 
also. 
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Fig. 2. Differential-linear structure with the experimental middle part 


3.3 Improvement upon the Differential Part for ARX Ciphers 


For a usual differential-linear attack, the adversary needs to use O(p~?r~2q7*) 
chosen plaintexts to construct a valid distinguisher at first. It means that, at the 
end of the differential part, the adversary has to collect O(r~?q~*) right pairs. 
In other words, the complexity of the attack comes largely from the procedure 
of generating eligible plaintext pairs. Yet, for a typical ARX cipher, we can 
significantly improve this by exploiting the properties we observe in Sect. 2. This 
has been discussed before in [7]. Here, we revisit and further extend the idea. 
Let us be given a cipher Æ. We have obtained the first sub part Ey :F} — F3 
and have found a differential characteristic 6; —> ôm with probability p. This 
means, among all x € F3, there expects to be 2” - p pairs satisfying the trail. We 
observe the ciphertexts of the pairs (x, 7@0;) to filter those dissatisfying the given 
differential trail with probability p. However, a lot of repetitive work has been 
done in this stage. Actually, the right pairs x € F} have a particular structure 
which is ignored by us. If we are given a right pair, we can generate many other 
right pairs by doing minor modifications. This can be naturally applied to reduce 
the data complexity in a classical differential-linear attack. We use x to cover all 
values of z which defines a right pair (x,x @ ði). Our goal is, given one x € xX, 
one can collect enough other elements from y. In particular, if we consider only 
those pairs in x, the expected correlation would increase to rq”, with a gain of p 
than before. We call those bits which keep independent of the given differential 
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Algorithm 1: Finding valid flipped bits 
Input: input difference 6;, output difference ôm, sample space N, lower bound 
of probability B 
Output: indexes of valid flipped bits 
1 Initialize counter c = 0 and r[0: n — 1] = 0; 
2 fori=1:Ndo 


3 Randomly pick a valid input x ; 
4 if F(x) ® E(x @ ôi) = dm then 
5 increase c; 
6 for j =0:n—1do 
7 Flip the j-th bit of the original x, denoted by 2; 
8 if E1 (2) ® FE: (ê @ ôi) = ôm then 
9 | increase r [j]; 
10 end 
11 end 
12 end 
13 end 
14 for j =0:n—1do 
15 | plj] =r [j] /c 
16 if p|j] > B then 
17 | save J; 
18 end 
19 end 


“valid flipped bits”. Note that the number of valid flipped bits should be enough. 
In particular, denote this number by b, we require 2? > r7?q7*. 

Since it is hard to know the complete distribution of right pairs, we start with 
a trivial analysis. Assume that Æ can be easily described by two independent 


parts, called E? and E}. Among them, EF, : F} — F3,E)/Ei : F} — F} and 
0 

n = 2m. We now have a differential a aa 6 with probability p, so we can 
extend it to the differential 0, a 2 0, 8, which holds the same probability. This 
suggests that we have found m valid flipped bits and thus we can generate 2” 
right pairs freely with only one known right pair. However, only a few ciphers can 
be treated in this way. For most ciphers, we need to use an automatic search. 
The basic idea is shown in Algorithm 1. In detail, the algorithm firstly takes 
sample pairs as input, then filters out the wrong pairs, and finally returns the 
probability that a right pair keeps right when flipping each bit itself. We will 
concern those bits with high probabilities to construct the subspace. 

In CRYPTO 2020, Beierle et al. used this trivial technique during the differ- 
ential part and some other ideas to significantly improve the cryptanalysis results 
of two typical ARX ciphers—Chakey and ChaCha. In their work, ChaCha is the 
case that can be simply divided into two independent input parts so that the 
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Fig. 3. The 8-bit addition example. x,y,z are 8 bit inputs and outputs, while a, 3,7 
are 8 bit input differences and output differences. 


subspace of flipped bits can be naturally found, while Chaskey is dependent on 
the experiment with the aid of automatic search. They set the lower bound of 
probability to 0.95. We are not aware of the rationality, but we consider it to 
be generated iteratively after numerous trials. In this paper, we develop a new 
technique to help us find more valid flipped bits. We study the conditions in the 
propagation of the differential. Moreover, we are able to know the exact effect 
of each bit on the given differential. 

Considering an simple addition model cipher with two 8-bit input and one 
8-bit output (see Fig.3), let us be given a valid differential trail a,G@ — y. 
We call one valid input (x,y) and another valid input (x ® a,y ® y) a pair. 
And a pair which satisfies (x + y) @ ((x ® a) + (y@ 8)) = y is called a right 
pair. We consider the carry bits c. For z = x + y, we define c[0] = 0 and 
cli +1] = (zfi] A yfi) v (æli] A clļi]) v (y[é] A cC[é]). Following the notions in Sect. 2.2, 
we start our analysis with a concrete example. 


Example 1. a {3}, @ {3,6} — 7 {6}. For this example, one case is shown in Fig. 4. 
From the example, we can see that a[3] and [3] jointly lead to @ in 4[3]. In 
detail, the difference in a[3] and [3] is eliminated because 0 + 1 = 1+0 and 
none of them generate carry to z[4]. If x[3] on the left is changed to value 1, this 
phenomenon will not happen. The value of «[3] and y/3] can be regarded as the 
first condition to generate the given differential. We can also see that 3/6] leads to 
4[6]. Intuitively, this may give us another condition. Since z[6] = x[6]@y[6] @c[6], 
the difference on this position comes from [6], y[6] and c[6] together. We can 
observe that the value of y[6] has no effect on the difference while the value of 
x|6] does. For any right pair satisfying this differential, if we change x[3] or æ[6], 
the pair would not stay right since the modification in the two positions has 
broken the conditions with which the difference can propagate. Moreover, due 
to the carry bit, for z[0 : 5] and y[0 : 5], we have to concern about how the 
value of the location affects c[6]. With the aid of the proposition in Sect. 2, we 
can easily evaluate these things. When we change x[5], it has a probability of 
50% that the pair keeps right. Because xdp*[5,@ — 5] = 1/2, in other words, 
the probability that the change does not disturb the output difference is 1/2. 


Improved Differential-Linear Attack to Speck32/64 801 


And obviously, if we flip [7], this differential will be unaffected. With respect to 
each bit of the input, we are able to compute the probability that flipping this bit 
would not make the right pair incorrect. We also summarize the experimental 
results in Fig.5. And it seems that the experiment fits our expectations well. 
Note that our result is tested upon a sample space of 216, 


10010010 10011010 
+11011101 +10010101 
01101111 00101111 


Fig. 4. One right pair of a {3} , 8 {3,6} — y {6}. 


From the example above, we can extend it to a more generic situation with 
detailed analysis. Note that the MSB of modular addition is one fixed valid 
flipped bit. We will ignore it in the following analysis. Then we can observe 
some interesting properties in several typical differentials. For a differential 
a {i}, 8 {i} — Ø, the right pair should satisfy x[i] 4 y|i]. If we flip any position 
except x[i] or yfi], it will have no effect on the original difference, which means 
that we can find 14 valid flipped bits for this specific differential. For a differ- 
ential a {i}, 8 {i} — y {i+ 1}, the right pair should not only satisfy x[i] = yf], 
but also satisfy [2 + 1] = y[i +1]. And we can find 12 valid flipped bits. For a 
differential a {i}, Ø — y {i}, we need to limit the output difference to the only 
active bit so that the right pair should satisfy yļi] = cfi]. Thus we can naturally 
find (7—i+1)+[7— (i+1)+1] = 15 — 2i valid flipped bits. Since c[#] is affected 
by the bit before i, we will destroy this condition when we change x[0: i— 1] and 
y[0 : i] with corresponding probability. If we flip x[j](0 < j < i), the probability 
will be the same as the probability that a[j] will not propagate to y/[#]. 


3.4 Improvement upon the Differential-Linear Distinguisher 


With the improvement upon the differential part, we can construct a differential- 
linear distinguisher with lower cost than before. In particular, if we can generate 
enough right pairs for the differential part, then we can use these pairs to distin- 
guish the cipher with lower data complexity, which would work as follows (the 
notions follow the analysis before in this paper): 


802 F. Wang and G. Wang 


1.00 4 e e e 


0.754 e e 


0.50 4 e e 


probabilities 


0.25 4 


0.00 + e e e 


o 1 2 3 4 5 6 7 8 9 D0 1 r Bw 3 
indexes of flipped bits 


Fig. 5. The probabilities that the pair stays right when index|?] is flipped, where x 
covers 0-7 and y covers 8-15. 


1. Collect enough right pairs (x, 2’). 
2. Compute 
Cor[Iy : E(x) 8 To: E(2’)). 


3. If we can observe the correlation of rq? with O(r~?q~*) pairs, we consider 
this distinguisher to be effective. 


4 Differential-Linear Attack on Round-Reduced 
Speck32/64 


4.1 The Overview of Our Attack 


We now give an overview of our attack. First, we divide the cipher into three sub 
parts. For Speck32/64, Fy covers 1 or 2 rounds, Em covers 7 rounds and FE covers 
1 round. And we search the appropriate trails for the three parts, respectively. 
Then, we try to find special bits to flip in the differential and reduce the data 
complexity of the whole attack. Finally, we give the complete key recovery attack 
of the round-reduced Speck32/64. 


4.2 Searching for Appropriate Trails 


To determine the exact number of rounds which is covered by three sub parts, 
it is natural to list all the possible combinations of E = E> o Em o FE; and find 
all the valuable differential and linear trails, respectively. However, it costs too 
much time to select the winner from all the combinations due to the limited 
computation. In fact, by observing many good differential trails and good linear 
trails, we can find that most of them have a special internal difference with only 
one active bit, which is called “hourglass structure” in [22]. So for the middle 
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part of our attack, we restrict ourselves to evaluate the differential-linear trail 
with a single active input bit and a single active output bit. In other words, 
we evaluate all the correlations of Cor(E(x)[j] © Em(a © 2*)[j]. Then, when 
the middle part has been determined, the output difference and the input mask 
of the two remaining parts would be fixed accordingly. By traversing all the 
differential trails with this output difference and linear trails with this input 
mask, the differential-linear distinguisher would be constructed. 

For Speck32/64, we have observed the following 7-round differential-linear 
trail denoted by DL with correlation r = 277-58; 


021000 0000 —- 020001 0000 


Under the restriction of output difference, we have two differential characteristics 
denoted by D1 and D2: 


Notation | Input Difference | Output Difference rounds covered | Probability(log2) 
D1 0x000a 0400 0x1000 0000 1 —2 
D2 0x8440 8102 0x1000 0000 2 —5 


Under the restriction of input mask, we have observed l-round linear trail 
denoted by L with the correlation of 271: 


070001 0000 —> 020ce00 0c00 


Note that this linear trail is derived from the classical linear approximation in the 
modular addition. Based on the analysis of Sect. 3, we can immediately obtain 
two differential-linear distinguishers: 9-round distinguisher D1 — DL — L with 
the correlation of about 2711-58 10-round distinguisher D2 — DL — L with the 
correlation of about 2714-58, 


4.3 Flipping Special Bits in Differential Characteristics 


To improve the complexity of the distinguisher listed above, we now examine 
the valid flipped bits in the differential trails. For D1, we can apply the results 
of example directly. Then we can easily find z[3 : 6] and y[13 : 15] as flipped bits 
with probability 1. And after computing all the probabilities, we choose 14 more 
bits ranked in the front, which are z[7 : 13] and y[0 : 6]. Since the probability of 
the flipped bits is not 1, we need to determine the exact probability that we can 
get such a right pair that we can generate 27! pairs from it. By the experiment, 
we test it to be 270-095, Using the Algorithm 1 and set the bound to be 0.95, 
we can obtain the same results. For D2, there are no trivial flipped bits with 
probability 1. Thus we use the Algorithm 1 directly and set the bound to be 0.7. 
We still can not find enough valid flipped bits. But for the first round of D2, we 
can find 24 valid flipped bits with probability 1, which are z[0 : 6], x[10 : 14], 
y[1], y[4: 7] and y[9 : 15). 
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4.4 Extending the Distinguishes to Key Recovery 


We first show that for Speck32/64, there has a natural 1-round filter with which 
we can check one input pair whether to be the right pair of given 1-round differ- 
ential without any information on key bits. Since under the single-key setting, it 
is obvious that (x; >> a + yi) $ ((x; > a + yi) = (xz; D a + yi) Oki $ ((x; >> 
a + yl) @k;. This means that we can filter wrong pairs before for free and we 
apply this trivial tip in following attacks. 

Then, the distinguishers (see Fig. 6) can be extended to a key-recovery attack 
by adding one round before and three rounds behind so that we can mount 13- 
round and 14-round practical attacks to round-reduced Speck32/64, respectively, 
as follows: 


9-Round Distinguisher for the 13-Round Key Recovery Attack. We 
now have a 9-round differential-linear trail: 


6; =0x000a 0400 =", 0271000 0000 4 0e00 0c00 = Io 


We describe the attack as follows: 


1. Collect 2? -20-095 pairs of plaintexts (P;, P!) s.t. their difference after the first 
round are equal to 6;. This procedure is consistent with the step mentioned 
in [1]. 

2. For each pair, partially encrypt it and check it whether to be a right pair of 
the differential. Note that this does not need any guess on k[1]. There expects 
one pair to remain after this step. If so, keep it and generate 27! pairs by 21 
valid flipped bits we examined. 

3. Request the ciphertexts (C;,C;) of these pairs under the unknown master 
key. 

4. Initialize a list of 237 counters to zeros. For each (C4, C!), try all the possible 
values of the 37 key bits kıo[0 : 4], k11, k12, partially decrypt (C;,C{) to the 
intermediate state corresponding to the output mask of our differential-linear 
distinguisher. Compute the xor sum of the subset of bits contained in I, if 
the values in both pairs are equal, increase the current counter. 

5. Sort the counter by the correlation. The right subkey is expected to be in the 
first 23774 values of the list, where a is the bit advantage we can obtain. 


The attack needs the data complexity of about 2-2?! = 2?%. Using the formula? 
in [28] and under such data complexity, we have 2 bit advantages and the success 
possibility is about 89%. The time complexity which is dominated by step 4 is 
222 . 237 — 259. Also, the attack needs 237 . 4 = 2° times of memory access. 


3 The formula was adapted to differential-linear cryptanalysis in [14] and we use the 
adapted version here. 
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Fig. 6. Our new distinguishers 


10-round distinguisher for the 14-round key recovery attack. We now 
have a 10-round differential-linear trail: 


6; = 028440 8102 + 0x000a 0400 ++ 021000 0000 ËS 020e00 0c00 = T, 


With little modification, we describe the attack as follows: 


1. 


Collect 2° pairs of plaintexts (P;, P!) s.t. their difference after the first round 
are equal to 6;. This procedure is consistent with the step mentioned in [1]. 


. For each pair, partially encrypt it and check it whether to be a right pair of 


the differential 02784408102 —» 02000a0400. Again, this procedure does not 
need any guess on k[1]. There expects one pair to remain after this step as 
well. If so, keep it and generate 274 pairs by 24 valid flipped bits we examined. 
Request the ciphertexts (C;,C!) of these pairs under the unknown master 
key. 

Initialize a list of 237 counters to zeros. For each (Cj, C’), try all the possible 
values of the 37 key bits k1[0 : 4], ki2,ki3, partially decrypt (C;,C{) to the 
intermediate state corresponding to the output mask of our differential-linear 
distinguisher. Compute the xor sum of the subset of bits contained in I, if 
the values in both pairs are equal, increase the current counter. 

Sort the counter by the correlation. The right subkey is expected to be in the 
first 237-4 values of the list, where a is the bit advantage we can obtain. 


This attack needs the data complexity of 2 - 2?4 = 275, and the time complexity 


which is dominated by the key guess in step 4 is about 2 - 


924 . 937 = 262. 


Again, using the formula in [28] and under such data complexity, we have 1 
bit advantages and the success possibility is about 91%. Also, the attack needs 


237 


-4 = 239 times of memory access. 
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5 Conclusion 


In this paper, we first study the differential propagation in modular addition. We 
then find that a certain differential-linear attack can be naturally improved by 
introducing valid flipped bits in the differential part. In addition, we present the 
first result of the differential-linear attack against Speck32/64. And we improve 
the complexity of the attack by using special bits in the differential part. Finally, 
we provide the key recovery attack on 14-round Speck32/64, with the complexity 
prior to the best-known result. Despite our analysis can not threaten the security 
of Speck32/64, we show the validity of our improved differential-linear analysis in 
ARX ciphers. In the future, we will extend our technique to other ARX ciphers. 
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Abstract. At CRYPTO’19, A. Gohr proposed neural distinguishers for 
the lightweight block cipher Speck32/64, achieving better results than 
the state-of-the-art at that point. However, the motivation for using 
that particular architecture was not very clear; therefore, in this paper, 
we study the depth-10 and depth-1 neural distinguishers proposed by 
Gohr [7] with the aim of finding out whether smaller or better-performing 
distinguishers for Speck32/64 exist. 

We first evaluate whether we can find smaller neural networks that 
match the accuracy of the proposed distinguishers. We answer this ques- 
tion in the affirmative with the depth-1 distinguisher successfully pruned, 
resulting in a network that remained within one percentage point of the 
unpruned network’s performance. Having found a smaller network that 
achieves the same performance, we examine whether its performance can 
be improved as well. We also study whether processing the input before 
giving it to the pruned depth-1 network would improve its performance. 
To this end, convolutional autoencoders were found that managed to 
reconstruct the ciphertext pairs successfully, and their trained encoders 
were used as a preprocessor before training the pruned depth-1 network. 
We found that, even though the autoencoders achieved a nearly perfect 
reconstruction, the pruned network did not have the necessary complex- 
ity anymore to extract useful information from the preprocessed input, 
motivating us to look at the feature importance to get more insights. 
To achieve this, we used LIME, with results showing that a stronger 
explainer is needed to assess it correctly. 


Keywords: Neural distinguisher - Feature importance - Speck - 
Pruning 


1 Introduction 


Traditional symmetric cryptanalysis shows small improvements over time, and 
people started considering alternative ways to improve it. Since deep learning 
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has recently attracted much attention due to the significant advances in research 
areas such as computer vision and speech recognition, it did not take long until 
researchers also started to consider Deep Neural Networks (DNNs) in the area of 
cryptography. DNNs are a family of non-linear machine learning classifiers that, 
given a dataset and a loss function, try to learn the optimal hyperparameters 
minimizing the loss. Using DNNs, A. Gohr was the first to achieve better results 
than that time’s state-of-the-art, revolutionizing cryptanalysis, i.e., the study of 
cryptographic systems with the purpose of finding weaknesses [7]. Encouraged 
by Gohr’s results, more papers followed that built upon his work, e.g., [9]. 

Starting from Gohr’s neural networks, the purpose of this paper is to inves- 
tigate whether there exists a smaller or better-performing neural network for 
executing a better distinguishing attack. Generally, in a distinguishing attack 
against a cryptographic primitive (a cipher in our case), the adversary tries to 
distinguish between (or classify) encrypted data and random data, thus helping 
in the cryptanalysis of the cipher. Specifically, if an adversary manages to dis- 
tinguish the output of a cipher from random data faster than a brute force key 
search, this is considered a break for the cipher. Thus, the cipher cannot be con- 
sidered secure enough to ensure the confidentiality of the encrypted information. 
These distinguishing attacks can be differential, in which case we talk about dif- 
ferential cryptanalysis, that is, cryptanalysis with regards to bitwise differences 
in the inputs given to the cipher [5]. In a differential attack, the non-random 
properties of the ciphertext pair produced by the cipher when given a plain- 
text pair with some known input difference are exploited for various purposes, 
one of which is distinguishing. Those differential attacks further branch into 
purely differential attacks, where the adversary uses only the ciphertext pair’s 
bitwise difference, and general differential attacks, where the information from 
the complete ciphertext pair is used [7]. Our work will focus on general differen- 
tial distinguishing attacks on the lightweight iterated block cipher Speck32/64 
achieved by neural networks. 


Motivation. While the application of neural networks in cryptanalysis evidently 
brings good practical results, it is also important to provide some theoretical 
support. Otherwise, the improvements make limited sense, as one cannot obtain 
guidance for the design and analysis of cryptanalytic primitives. Thus, it becomes 
important to study the behavior of neural network distinguishers and the inter- 
pretability and explainability of such solutions. Unfortunately, deep learning 
explainability is a difficult problem that is not solved in general. Still, some 
observations are possible, especially from the perspective of the neural network 
size and the feature importance. 

Recently, either a rather sophisticated technique exploited Speck’s internal 
state values obtained through brute force key search [4] or a model that required 
k times more data was deployed [10]. On the contrary, to make the analysis 
simpler, our paper remains in the low data setting. Concretely, only the plaintext 
inputs and ciphertext outputs are known. In addition, the same training/test size 
and data format as in Gohr’s work is kept for comparison. Concretely, we want 
to find out whether: 
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1. A smaller, equally-good-performing distinguisher can be obtained by system- 
atically pruning Gohr’s distinguishers to the bare minimum needed to achieve 
their current performance. 

2. Preprocessing the input will improve the performance of Gohr’s (pruned) 
distinguishers. 


Main Contributions. We show that the state-of-the-art on neural distinguisher 
can be improved and that there are still multiple avenues to explore. We demon- 
strate these with the following contributions, which, to the best of our knowledge, 
are the first studies in the setting of neural differential distinguishers. 


1. We evaluate the Lottery Ticket Hypothesis [6] on neural Speck distinguish- 
ers to see whether a smaller or better-performing network can be obtained, 
finding out that this is the case. Indeed, the Lottery Ticket Hypothesis states 
there are subnetworks that match or even outperform the accuracy of the 
original network. To obtain such subnetworks, we conduct pruning based on 
average activations equal to zero. 

2. We successfully strip the currently best neural distinguisher for Speck (the 
depth-1 distinguisher), presenting a smaller network whose accuracy remains 
around one percentage point of the depth-1 distinguisher’s. 

3. We successfully train autoencoders that achieve a nearly perfect reconstruc- 
tion of the given ciphertext pairs and study the performance of the proposed 
(and pruned) Speck distinguishers when autoencoders do a prior feature engi- 
neering. 

4. We study the importance of the inputs using Local Interpretable Model- 
agnostic Explanations (LIME) [15] to gain insights into the (pruned) distin- 
guishers’ behavior, which might aid in the improvement of future preprocess- 
ing methods. 


2 The Speck Family of Block Ciphers 


2.1 Notations and Conventions 


In this paper, the bitwise eXclusive-OR operation will be denoted by 6, the 
bitwise AND operation by A, modular addition modulo 2” by Œ, a left or right 
bitwise rotation by < and >, respectively, and the concatenation of two-bit 
strings a and b will be denoted by a || b. Furthermore, the Hamming weight hw 
of a bit string is given by the number of ones present in it. 


2.2 Speck Block Cipher 


The lightweight iterated block cipher Speck was designed by Beaulieu et al. for 
the US National Security Agency (NSA) with the intent of being efficient in 
software implementations on micro-controllers [3]. At its core, it is comprised 
of three basic functions: modular Addition (modulo 2*), bitwise Rotation, and 
bitwise eXclusive-OR of k-bit words, thus being an ARX construction. Since it 
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is an iterated block cipher, it has a round function (that is iterated), which, in 
the case of Speck, is a simple Feistel structure. The round function F : FE x F3* 
takes as input a k-bit subkey K and the cipher’s internal state that consists of 
two k-bit words denoted as L; and R;, and computes the cipher’s next internal 
state as: 


Liyı = ((Li > a) E Ri) K (1) 
Risa = (Ri < B) @ Lipi (2) 


Here, i and i+/ represent the current, respectively, next round, and a and 8 
are constants specific to each member of the Speck cipher family. Regarding the 
subkeys used, they are generated with a non-linear key schedule from a master 
key using the above-described round function as the main operation, but with 
details that change from one Speck member to another. 

As the key schedule will not be studied in this paper, please refer to [3] for 
additional information. Concretely, for the Speck member studied in this paper, 
the block size n is 32 bits, the word size k is 16 bits, the key size m is 64 bits, a 
is 7, @ is 2, and the round function is applied maximally 22 times to compute a 
ciphertext output from the plaintext input. 


2.3 The Setup 


For the implementation of the Speck32/64 cipher and distinguishers studied, as 
well as for the algorithms needed for generating the datasets with a given input 
difference and evaluating the results, this paper refers to the code provided by 
the author of [7] here’. For all experiments, a training set of size 10’, a test 
set of size 10°, and a batch size of 5000 were used as in the previous related 
work [7]. Finally, the experiments were run on an RTX 3090, and the code that 
was used to conduct the experiments, as well as some figures, can be found in 
the associated repository’. 


3 Related Works on Neural Speck Distinguishers 


Since the release of the lightweight block cipher Speck, differential and neu- 
ral distinguishers have been used to cryptanalyze it. First, at CRYPTO’19, 
Gohr proposed such distinguishers, focusing on the input difference A;n = 
0x0040/0000 [1]. The author defined real pairs as being ciphertext pairs (C, C”) 
resulting from encrypting plaintext pairs (P,P’) where P @ P’ = Ain, and 
random pairs being ciphertext pairs (C,C’) resulting from encrypting plaintext 
pairs (P, P’) where there is no fixed input difference. Then, the author aimed to 
distinguish the real pairs from the random pairs, deploying several methods that 
are described below. In the process, the author compared the performance of a 
purely differential distinguisher to a neural distinguisher for 5 to 8 rounds, show- 
ing that the neural distinguisher outperforms the purely differential one. Those 


1 https: //github.com/agohr/deep_speck. 
? https: //github.com/NoricaBacuieti/TheSpeckAttack. 
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distinguishers were denoted D, and N, for differential and neural distinguishers 
for Speck reduced to r € {5,6,7,8} rounds, respectively. 


Purely Differential Distinguisher. First, the entire difference distribution 
table (DDT) of Speck for the input difference A;, was computed under the 
Markov assumption [13]. Then, to distinguish real ciphertext pairs from random 
ones, the author first assumed that random ciphertext pair differences, i.e., Aout 
= C @C", are distributed according to the uniform distribution. Next, the author 
took the corresponding transition probability P (Ain > Aout) from the DDT, 
classifying the ciphertext pair difference as real, if P (Ain > Aout) > se and 
as random otherwise. For more details, please refer to [7]. 


Gohr’s Neural Distinguisher. The proposed deep neural network is a residual 
network consisting of three types of blocks: an initial convolution, convolutional 
blocks, and a prediction head. Concretely, they are: 


1. Block 1: the initial convolution consisting of a 1D-CNN layer with kernel size 
1, 32 channels, padding, and stride of size 1, followed by batch normalization 
and a ReLU activation layer. 

2. Block 2-i: the convolutional one-to-ten residual blocks/units, each residual 
block consisting of two 1D-CNN layers with kernel size 3, 32 channels and 
padding, and stride of size 1, each followed by batch normalization and a 
ReLU activation layer. These layers are then followed by an additional layer 
where the input of this block is also added to its output and passed to the 
input of the subsequent block. This last operation makes the block, and thus 
also the network, residual, the input that skips all those layers being called a 
residual connection. 

3. Block 3: the prediction head consisting of two dense layers, having 64 neurons 
and followed by batch normalization and a ReLU activation layer each, closing 
with a dense layer of one neuron using a sigmoid activation function. 


The neural distinguishers give a score between 0 and 1, where a score greater 
than or equal to 0.5 classifies the sample as a real pair; otherwise, it is classified as 
random. Using this setup, neural distinguishers were trained for Speck reduced 
to 5 and 6 rounds, but different approaches were taken for Speck reduced to 7 
and 8 rounds. For Speck reduced to 7 rounds, key search was used to improve 
the accuracy of the neural distinguisher. For more details, the method described 
can be found in [7]. 

Moving to the neural distinguisher for 8 rounds, since the previously men- 
tioned approach did not improve this distinguisher’s performance, the neural dis- 
tinguisher for 8 rounds was obtained from the seven-round neural distinguisher 
using the staged training method. Again, more details can be found in [7]. 

Obtaining superior results compared to purely differential distinguishers indi- 
cated that the neural distinguishers learn more than differential cryptanalysis. 
It thus motivated A. Gohr to conduct the real differences experiment with the 
goal to distinguish real ciphertext pairs (C,C’) drawn from the real distribu- 
tion (again obtained from the Ain = 0x0040/0000 difference) from masked real 
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ciphertext pairs (C 6 M,C’ ® M) where Mis a random 32-bit value. By conduct- 
ing this experiment, Gohr wanted to show that the previously obtained neural 
distinguishers (without retraining) offer comparable results to key search. The 
results for both the real-vs-random, as well as for the real differences experiment, 
can be found in [7]. 

Inspired by A. Gohr’s work, Benamira et al. [4] went further and developed 
an approach to estimate the property learned by Gohr’s deep neural network. 
Concretely, they replaced Gohr’s three building blocks with the following steps: 


1. Changing (C,C’) into I = (AL, AV,Vo,Vi), where AL = C; © C} is the 
addition modulo 2 between the left parts of C and C’, and V; = Li © R; is 
the difference between the two parts of the internal state at round i. 

2. Changing the 512-feature vector [4] of the DNN into a feature vector of prob- 
abilities F = (P (Real | Inn) P (Real | Iu) a? P (Real | Intm)) - 

3. Changing the final dense layer of the third building block into the Light 
Gradient Boosting Machine (LGBM) [12] model. 


The authors defined an output distribution table (ODT) directly on the 
values (AL, AV, Vo, V1) instead of the DDT of the ciphertext pair differ- 
ence (C; ® Cj, Cr @ CL). Then, they used the ODT to define a masked output 
distribution table (M-ODT). This M-ODT is a compressed ODT where the input 
is not I= (AL, AV, Vo, V1), but Im = (AL A My, AV A Mg, Vo A M3, Vi A Ma), 
where M E€ Mhu, M = (Mi, M2, M3, M4) is an ensemble of four 16-bit masks, 
each having the Hamming weights hw (later set to 16 and 18). Then, by consid- 
ering several masks, they defined the set of relevant masks of Mp, as Rm, being 
able to compute for each input J the probability P (Real | Im), VM € Rm [4]. 
Having those defined, they developed a three-step approach for recognizing the 
output of Speck reduced to 5 and 6 rounds as follows: 


1. Extract the masks from Gohr’s DNN with dataset 1. 

2. Construct the M-ODT with dataset 2. 

3. Train the LGBM classifier from the probabilities stored in the M-ODT with 
dataset 3. 


Through this approach, they obtained results similar to Gohr’s DNN, thus show- 
ing that they have successfully modeled the DNN’s property. The results can be 
seen in [4]. 

They concluded by explaining how to improve A. Gohr’s results by means of 
creating batches of ciphertext inputs instead of pairs. They used two approaches 
for training and evaluating the M-ODT distinguisher: one where each element 
of the batch is given a score by the distinguisher and then takes the median of 
the results, and the other one where the whole batch is considered as a single 
input. For both methods, they obtained a 100% accuracy on 5 and 6 rounds, as 
well as on 7 rounds with the first method [4]. 

More recently, taking inspiration from both the works mentioned above, Hou 
et al. [10] first developed an algorithm based on SAT, which returns input differ- 
ences of high-probability differential characteristics. They proposed an alterna- 
tive format for the training and test data, where they would group k ciphertext 
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differences in a matrix and regard it as one sample, and they used this type of 
sample to train the ResNet. Concretely, they tried it either with the same input 
difference as A. Gohr or a better one as chosen by their SAT-based algorithm. 
Using this new data format in combination with the input differences suggested 
by their algorithm, they managed to obtain an accuracy of 88.19% and 56.49% 
for Speck reduced to 7 and 8 rounds, respectively, which is superior to A. Gohr’s 
results. More details can be found in [10]. 


4 The Network Under Lens 


First, A. Gohr’s network containing ten blocks of type 2 and its performance on 
Speck reduced to 7 and 8 rounds will be examined. Following this, the Lottery 
Ticket Hypothesis using two different pruning methods: one-shot pruning and 
iterative pruning will be evaluated for this depth-10 distinguisher, analyzing the 
results. After that, Gohr’s best network, the depth-1 distinguisher, containing one 
block of type 2, will be examined in detail to see whether even this already small 
network can be further pruned. For this purpose, the Lottery Ticket Hypothesis 
will be evaluated for this network, followed by a computation of the average per- 
centage of activations equal to zero and pruning of the network. 


4.1 The Initial Network 


First, we aim to reproduce the results given in [7] with the depth-10 neural 
distinguisher for Speck reduced to 5 and 6 rounds. In addition, we also want to 
see its performance for Speck reduced to 7 and 8 rounds by following the same 
training method as opposed to the approaches used in [7]. After having trained 
and evaluated the distinguishers five times, the results can be seen in Table 1. 


Table 1. Accuracies of the depth-10 Neural distinguishers for Speck32/64 reduced to 
5, 6, 7, and 8 rounds in the real-vs-random experiment. 


Distinguisher Accuracy TPR TNR 
Ns 0.927 + 1.46 x 1074 0.901 + 3.92 x 1074 0.953 + 5.86 x 1074 
Ne 0.787 + 3.90 x 1074 0.719 + 9.66 x 1074 0.855 + 7.45 x 1074 
N7 0.611 + 4.17 x 1074 0.551 + 1.98 x 107? 0.6714 1.90 x 1073 
Ns 0.500 Æ 7.53 x 107° 0.368 + 3.55 x 1071 0.632 + 3.55 x 1071 


From these results, one can see that for Speck reduced to 5 and 6 rounds, the 
results could be reproduced. What is more, for Speck reduced to 7 rounds, the 
distinguisher gave a similar accuracy to the one in [7], where the author used the 
approach mentioned in Sect. 3. Perhaps, the more sophisticated approach [7] was 
used more for seeing whether it would improve the distinguisher’s accuracy, but 
since the improvement is insignificant, we will use the same training approach as 
for the first two distinguishing cases. When looking at the Ng distinguisher, the 
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improvement achieved by using the approach mentioned in Sect. 3 managed to 
make the neural distinguisher slightly better than the differential distinguisher. 
However, it is not considerable compared to the Ng distinguisher trained using 
the same approach as for Speck reduced to 5 and 6 rounds. Since the Ng distin- 
guisher without the training approach mentioned above is no better than random 
guessing, even though results will be given for it as well, the decisions will be 
based on the results of the other three distinguishers. 

With this, we turn to the next section, where we will look at the Lottery 
Ticket Hypothesis and the results obtained by effectuating the steps needed for 
evaluating it for the depth-10 and depth-1 versions of this distinguisher. 


4.2 The Lottery Ticket Hypothesis 


The Lottery Ticket Hypothesis (LTH) was first proposed by Frankle and Carbin 
in [6]. It was proposed after finding that appropriately initialized pruned net- 
works are capable of training effectively while achieving a comparable accuracy 
to the original network in a similar number of training epochs. It reads as follows: 


A randomly initialized dense neural network contains a subnetwork initialized 
such that - when trained in isolation - it can match the test accuracy of the 
original network after training for at most the same number of iterations. 


Thus, the reasons behind evaluating the LTH is to see whether: 


1. Some subnetworks perform similar to the baseline network for each of the 
four distinguishers, and how much the performance decreases as the network 
becomes more sparse. The goal is to get an idea of the trade-off between the 
network’s size and its performance. 

2. There are winning tickets and whether their performance is significantly bet- 
ter than the baseline network. 

3. Similar conclusions to the ones in [6] can be drawn. Those are: 

— Iterative pruning finds winning tickets that match the accuracy of the 
baseline network at smaller network sizes than one-shot pruning. 
— Winning tickets are 10% (or less) to 20% of the baseline network’s size. 


As mentioned, these subnetworks are obtained by pruning, and in the fol- 
lowing subsections, two pruning strategies will be put under test for the LTH: 
one-shot pruning and iterative pruning. 


The Winning Tickets. According to [6], after having pruned the trained base- 
line network of the smallest-magnitude weights, we are ready to define what a 
winning ticket is. A winning ticket is a subnetwork that, when trained in isola- 
tion after having had the remaining weights reinitialized with the weights of the 
baseline network prior to training, will provide classification accuracy equivalent 
or superior to the baseline networks. 

Frankle and Carbin have repeated the experiments with random initializa- 
tion of the pruned network. However, the randomly initialized pruned network no 
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longer matched the trained (unpruned) baseline network’s performance eviden- 
tiating that the pruned networks need to be appropriately initialized. Therefore, 
the pruning strategies will be defined to reinitialize the remaining weights of the 
pruned network to the weights of the unpruned network prior to training. 


One-shot pruning. In one-shot pruning, the baseline network is trained once, p% 
of the weights are pruned, and then the remaining weights are reinitialized to the 
weights of the baseline network prior to training. The process will be repeated 
for several values of p% to see the possible changes in the performance of the 
distinguisher. Please refer to [6] for the pseudocode and further details. 


Iterative pruning. In iterative pruning, we again start from a baseline network 
that is trained once. But unlike in one-shot pruning, where we start from the 
same pretrained weights 6) each time we repeat the process with a different value 
of p%, now, at each pruning trial 7 € {1, t}, p% of the remaining weights are 
pruned. Again, since there is no indication of what an appropriate value for p% 
would be, one would have to try different values. However, if the improvement 
of the winning tickets’ performance will not be significant, the experiments will 
be run just for one value of p%. For more details, please refer to [6]. 


Results. Here, the results that were obtained by evaluating the LTH for the N5, 
No, N7, and Ng distinguishers based on the depth-10 neural network are given. 
Experiments with both one-shot pruning (depicted in yellow) as well as iterative 
pruning (depicted in green) were conducted and compared to the results obtained 
for the (unpruned) baseline model (depicted in black). Specifically, for each 
distinguisher, accuracy was computed and compared to those obtained with the 
baseline network. The experiment was run five times per pruning ratio for each 
pruning method, the results were averaged, and the minimum and maximum at 
each pruning trial were indicated. The accuracies can be seen in Fig. 1. 
For both pruning methods, there were 9 pruning trials, where: 


1. For one-shot pruning: 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%, 
respectively, of the network was pruned. 

2. For iterative pruning: 20% of the remaining network’s weights was pruned 
per trial. 


Looking at the results of the four distinguishers, two things can be observed 
immediately: at least up to 90% of the depth-10 network can be pruned without 
losing (on average) performance, and there are winning tickets that even slightly 
outperform (on average) the baseline network. Therefore, it can be empirically 
confirmed that the LTH does indeed find subnetworks (winning tickets) that will 
provide classification accuracy equivalent or superior to the baseline network. 

Then, looking at the findings of the authors in [6], they have found that 
iterative pruning finds winning tickets that match the accuracy of the baseline 
network at smaller network sizes than does one-shot pruning. In addition, they 
have also found that the winning tickets are 10% (or less) to 20% of the base- 
line network’s size. These findings could not be entirely confirmed by the results 
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above nor by the ones presented in the next subsection where the depth-1 net- 
work (which can be regarded as an already 90% pruned version of this depth-10 
network) is examined. First, iterative pruning does not seem to be superior to 
one-shot regarding finding winning tickets that match the accuracy of the base- 
line network at smaller network sizes. Looking at the results, iterative pruning 
is either outperformed by one-shot pruning or is just barely outperforming one- 
shot pruning. Perhaps more trials per pruning ratio are needed to be firm in this 
sense. However, since iterative pruning is, as noted by the authors of [6] costly, 
the conclusion is left that both pruning methods perform similarly concerning 
finding winning tickets at smaller network sizes. Second, the results presented 
above and in the next subsection show that winning tickets can be found, in 
general, at every pruning ratio by both pruning methods. 
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(a) The accuracy of Ns after (b) The accuracy of Ne after 
pruning p% of the network. pruning p% of the network. 
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(c) The accuracy of Ny after (d) The accuracy of Ng after 
pruning p% of the network. pruning p% of the network. 


Fig. 1. The accuracys obtained after evaluating the LTH for the depth-10 N5, Ne, N7, 
and Ng distinguishers. 


While, on the one hand, we have looked at whether similar conclusions to 
the ones in [6] could be drawn, on the other hand, some of their conclusions 
were directly considered when running the experiments. First, while the globally 
smallest-magnitude weights were pruned in both pruning methods, the authors 
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also experimented with pruning the smallest weights per layer with the same 
ratio, finding that for ResNet-18 and VGG-19, global pruning finds smaller win- 
ning tickets. They explained that some layers have far more parameters than 
others and that when all layers are pruned with the same ratio, the smaller 
layers become bottlenecks. Since in the depth-10 baseline network, some layers 
presented a similar difference of parameters as in the network studied by them, 
the experiments were run directly with the globally smallest-magnitude weights 
pruning approach to avoid the pitfall of having such bottlenecks. 

Second, the authors have also found that the value from which the learning rate 
starts matters for the LTH’s success. When starting from a higher learning rate for 
Resnet-18 and VGG-19, the performance of the networks obtained with iterative 
pruning was no better than that of randomly reinitialized pruned networks (ran- 
dom guessing). However, they have found that at a lower learning rate, the sub- 
networks remain within one percentage point of the baseline network’s accuracy. 
Although they do not give intuition behind this result, since the point of the LTH is 
to find subnetworks that match or outperform the baseline network’s performance, 
it makes sense to choose a (lower) learning rate that would allow the model to learn 
amore optimal set of weights. The baseline network proposed by A. Gohr already 
started with a small learning rate of 0.002 which further decreased to 0.0001, and 
as seen, iterative pruning did indeed find winning tickets. 


4.3 The Smaller Network 


Having seen that at least 90% of the depth-10 network can be pruned (even 
with some minor improvement on average), we now turn to the best network A. 
Gohr has found, namely, the version with only one block of type 2, the depth- 
1 network. Again, first, we will try to reproduce the results Gohr obtained for 
Speck reduced to 5 and 6 rounds, and then see the distinguisher’s performance on 
Speck reduced to 7 and 8 rounds using the same training approach as for the first 
two distinguishing cases. After having trained and evaluated the distinguishers 
five times, the results can be seen in Table 2. 


Table 2. Accuracies of the depth-1 Neural distinguishers for Speck32/64 reduced to 
5, 6, 7, and 8 rounds in the real-vs-random experiment. 


Distinguisher Accuracy TPR TNR 
Ns 0.927 + 1.46 x 1074 0.897 + 1.06 x 107? 0.954 + 8.45 x 1074 
Ne 0.783 + 1.39 x 1074 0.717 + 1.34 x 107? 0.850 1.11 x 1073 
Nz 0.608 + 9.91 x 1074 0.542 + 3.99 x 107? 0.674 + 4.56 x 1073 
Ns 0.500 + 1.52 x 1074 0.51 +1.92 x 107! 0.489 1.92 x 107! 


As expected from the results of the previous subsection, as well as from the 
results that Gohr obtained for the depth-1 Ns and Ne distinguishers, the accu- 
racy of the depth-1 N5 and Ng distinguishers remained similar to the depth-10 
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distinguisher’s. What is more, the accuracy of the Ny distinguisher decreased 
by only around one percentage point, which can be considered as an insignifi- 
cant decrease. Having seen that reducing the depth-10 network to depth-1 does 
not affect the distinguishers’ performance significantly, the next question raised 
was whether the depth-1 network can be pruned even more at an insignificant 
performance loss. The LTH with the two pruning methods was evaluated again 
for this depth-1 network to determine whether this is the case. The accuracies 
can be seen in Fig. 2. Those show, again, that one could prune even 90% of this 
small network without losing (on average) in terms of performance. Therefore, 
in the next subsections, we will look at the importance of each major part of 
this smaller network and how it affects the performance of the distinguishers. 


4.4 How Much Smaller Can We Go? 


To find the answer to this question, we will first look at the activation map of 
each layer of the four neural distinguishers to see how much they learn at each 
layer. The idea is that if we see completely black activation maps, nothing is 
learned at that layer so that it can be pruned entirely. However, if there is only 
some activation present, some channels/neurons of that layer can be pruned, 
and how much it can be pruned in such cases needs to be determined through 
experiments. In this paper, results will be given for two cases that show that 
the performance is marginally affected even when the depth-1 network is pruned 
significantly. 

To compute the activation maps, the keract? [14] library was used, and they 
can be seen in the associated repository. For each of the four distinguishers, the 
five activation maps were computed, where A1, A2, and As correspond to the 
activation maps of the three convolutional layers, and A, and As correspond 
to the activation maps of the two dense layers. However, since the results were 
similar, just the activation maps of the Ns distinguisher are given. 

After examining them, the findings confirm the results obtained with the 
LTH, according to which even this small network can be further pruned. Con- 
cretely, the A; activation maps have around 13-15 channels with no activation 
or an insignificant number of activations, the Ag activation maps have 12-18 
such channels, and the A3 activation maps, 6-10 channels. Then, looking at the 
A4 and As activation maps, there are around 10 and 20-25 neurons, respectively, 
that show some activation. 

Now, even though these were the activation maps for just one input value 
each, the results were similar for all three distinguishers, so we go to the next step 
where we prune. As mentioned, besides having some maps with no activation, 
which will not influence the performance of the distinguishers, some maps have 
almost no activation but with a/some large activation value/s. First, to confirm 
that the performance will not be affected, a network from which the minimum 
number of empty channels/neurons will be removed from each layer will be 
trained. 


3 https: //pypi.org/project/keract /4.4.0/. 
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Fig. 2. The accuracys obtained after evaluating the LTH for the depth-1 Ns, Ne, N7, 
and Ng distinguishers. 


Then, going to the other extreme, the maximum number of empty channel- 
s/neurons over all three distinguishers for each layer will be pruned to see how 
much the performance is affected and whether a finer-grained pruning approach 
is needed. To conduct these experiments, the kerassurgeon* library will be used. 
However, since it does not support residual connections, we will first see whether 
they have a significant impact on the performance of the distinguishers. After 
training the depth-1 distinguishers with no residual connection, the results can 
be seen in Table 3. 

As can be seen, the accuracy of the distinguishers did not decrease, which 
indicates that a residual connection is not necessary. This was also expected since 
the use of residual connections is to allow the training of very deep neural net- 
works. In a nutshell, those residual connections mitigate the vanishing gradients 
and accuracy saturation problems by allowing an alternate path for gradients 
to flow through and allowing the model to learn an identity function [8]. This 
ensures that the higher layers will perform at least as well as the lower (deeper) 


t https: //pypi.org/project/kerassurgeon/. 
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Table 3. Accuracies of the depth-1 Neural distinguishers with no residual connection 
for Speck32/64 reduced to 5, 6, 7, and 8 rounds in the real-vs-random experiment. 


Distinguisher Accuracy TPR TNR 
Ns 0.925 + 1.44 x 1074 0.897 + 1.30 x 107? 0.954 + 1.24 x 107° 
Ne 0.784 + 3.35 x 1074 0.7144 5.68 x 1074 0.855 + 8.07 x 1074 
Nr 0.608 + 2.55 x 107? 0.542+ 6.00 x 1073 0.671 4.94 x 107% 
Ng 0.500 + 1.36 x 1074 0.57 +4.48 x 107} 0.43+4.50 x 107! 


layers. However, since we have such a small network, the benefits of using a 
residual connection vanish, allowing us to eliminate it without compromising 
the distinguishers’ performance. Since the residual connection does not impact 
the distinguishers’ performance, we will start pruning the depth-1 network with 
no residual connection to see how much smaller we can go. The results can be 
seen in Tables 4, 5,6, 7. 


Table 4. Accuracies and pruned channels/neurons of each layer of the depth-1 Neural 
distinguisher with no residual connection for Speck32/64 reduced to 5 rounds in the 
real-vs-random experiment for different APoZ values. 


APoZ Accuracy C1 C2 C3 DI D2 
1 0.925 + 1.58 x 1074 64 2 2 26 3.4 
0.9 0.924+1.50x10°? 62 96 11 246 3.2 
0.8 0.920 1.76 x107? 6 14 23.2 45.6 6.6 
0.7 0.904 3.05 x 107? 11.6 20.4 28.4 53.6 10.6 


Table 5. Accuracies and pruned channels/neurons of each layer of the depth-1 Neural 
distinguisher with no residual connection for Speck32/64 reduced to 6 rounds in the 
real-vs-random experiment for different APoZ values. 


APoZ Accuracy C1 C2 C3 D1 D2 
1 0.785 +9.06 x 1074 64 6 2.2 22 44 
0.9 0.783 3.02 x 107? 5.4 11.2 14.4 21.8 7 
0.8 0.780 1.32 x 107? 36 15 21.8 40.4 10.4 
0.7 0.7374+1.48x 107? 10 20 286 56 30.4 


Using kerassurgeon, each of the four distinguishers’ layers were pruned based 
on the average percentage of activations equal to zero (APoZ) described in [11]. 
In the tables, the accuracies for each distinguisher and the average number of 
channels/neurons that were pruned from each of the five layers are presented. 
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Table 6. Accuracies and pruned channels/neurons of each layer of the depth-1 Neural 
distinguisher with no residual connection for Speck32/64 reduced to 7 rounds in the 
real-vs-random experiment for different APoZ values. 


APoZ Accuracy C1 C2 C3 D1 D2 
1 0.607 + 3.23 x 1073 8.2 15 12.2 16.2 27.8 
0.9 0.6084 1.23 x 107? 6.8 17.2 17.8 26.2 31.4 
0.8 0.601+8.78 x 10-3 6.6 20.2 24.8 40.8 35.6 
0.7 0.597+5.25x 107% 12 25.6 27.4 496 43 


Table 7. Accuracies and pruned channels/neurons of each layer of the depth-1 Neural 
distinguisher with no residual connection for Speck32/64 reduced to 8 rounds in the 
real-vs-random experiment for different APoZ values. 


APoZ Accuracy C1 C2 C3 D1 D2 
1 0.500+2.80x10-4 0.8 0 0 2.2 15.4 
0.9 0.500+2.32 x 1074 2.2 0.2 1.2 2.4 20.2 
0.8 0.500+3.23x 10-4 2.2 12 2 11.8 24 
0.7 0.500+1.09x 10-4 5.6 84 7 17.2 32.6 


As suspected, the depth-1 network can be even further pruned without sig- 
nificantly impacting the distinguishers’ performance. One can see that when the 
channels/neurons that had the APoZ value greater or equal to 0.7 were removed 
from the Ng distinguisher, the accuracy decreased by five percentage points. 
In contrast, for greater cutoff values, the accuracy decreased by less than one 
percentage point. Now, seeing that the depth-1 network can be pruned, we will 
decide how much to prune the network before moving to the next section. 

Since we saw the accuracy decreasing for an APoZ value greater or equal to 
0.7, we look at the number of channels/neurons pruned above this cutoff across 
the first three distinguishers. Two experiments were run where the distinguishers’ 
layers were pruned in two ways: one in which the smallest (for an APoZ value 
equal to 1) and one in which the largest (for an APoZ value equal to 0.8) number 
of channels/neurons per layer across all three distinguishers was pruned. The 
processes were called min-pruning and max-pruning, and the results can be seen 
in Tables 8 and 9. 


Table 8. Accuracies of the min-pruned depth-1 Neural distinguishers with no residual 
connection for Speck32/64 reduced to 5, 6, 7, and 8 rounds in the real-vs-random 
experiment. 


Distinguisher Accuracy TPR TNR 
Ns 0.923 + 1.67 x 1073 0.890 + 3.52 x 1073 0.955 +5.11 x 1074 
Ne 0.782 + 6.27 x 1074 0.713 +1.19 x 1073 0.850 + 6.12 x 1074 
N7 0.605 + 1.75 x 1073 0.546 + 3.70 x 107? 0.664 4.26 x 1073 
Ng 0.500 + 1.99 x 1074 0.54 5.36 x 1071 0.44 2.50 x 1071 
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Table 9. Accuracies of the max-pruned depth-1 Neural distinguishers with no residual 
connection for Speck32/64 reduced to 5, 6, 7, and 8 rounds in the real-vs-random 
experiment. 


Distinguisher Accuracy TPR TNR 
Ns 0.915 + 1.41 x 107? 0.875 + 1.54 x 107? 0.955 + 1.50 x 107° 
Ne 0.770 + 4.24 x 107? 0.691 + 8.84 x 107? 0.848 + 1.37 x 1074 
Ny 0.596 + 7.70 x 107? 0.543 + 7.40 x 107? 0.648 + 1.63 x 107? 
Ng 0.500 + 1.65 x 1074 0.54+2.83 x 107! 0.460 + 2.83 x 107} 


While the performance was expected not to be impacted in the first case, the 
second one was done more of sheer curiosity to see how the performance would 
change. As expected, in the first case, the performance remained within one 
percentage point, but, in the second case, the performance surprisingly remained 
again within one percentage point. For all experiments conducted in this paper 
(unless otherwise specified), the results are the average of five trials, which was 
considered appropriate given the time some of the experiments took (see the 
associated repository for details). However, the results might differ a bit if more 
trials per experiment would be conducted. Nevertheless, we keep the max-pruned 
network where we remove 7 channels from Cl, 21 from C2, 25 from C3, 46 
neurons from D1, and 36 from D2. 

Finally, satisfied that we could even further prune the depth-1 network with 
no residual connection while the performance remained within one percentage 
point, we will move to the next section. 


5 Visualizing the Important Features 


In this section, we will look at whether a prior feature engineering will improve 
the performance of our distinguishers and whether all 64 input bits are needed 
for classification. A trained encoder will be used to preprocess the input, and 
regarding the assessment of the feature importance, LIME will be used. Finally, 
the experiments will be conducted on the max-pruned depth-1 network with no 
residual connection (also referred to as pruned network). 


5.1 Feature Engineering Using an Autoencoder 


Here, we will look at whether prior input engineering will improve the perfor- 
mance and, for this purpose, autoencoders of various compression capacities 
have been trained. An autoencoder is a neural network that learns to repro- 
duce its input to its output, and it comprises of two parts: an encoder and a 
decoder. The encoder compresses the input to a latent representation, that is, 
an encoding that contains all the important information needed to represent the 
input, and the decoder takes this latent representation, trying to reconstruct 
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the input [2]. The reason for choosing autoencoders to perform feature engineer- 
ing was that autoencoders learn such a latent representation that ignores noise, 
anticipating that the network’s performance would improve by bringing the use- 
ful features forward. The autoencoders corresponding to the results presented 
in Tables 10, 11, and 12 consist of one, two, and three blocks, respectively, each 
block being comprised of: 


1. A 1D-CNN layer with kernel size 3, 32 channels, padding and stride of size 
1, followed by a batch normalization and a ReLU activation layer. 
2. A 1D-MaxPooling/1D-UpSampling layer with pool-size/size 2. 


Table 10. Accuracies of the one-block autoencoder for Speck32/64 reduced to 5, 6, 7, 
and 8 rounds in the real-vs-random experiment. 


Distinguisher Accuracy TPR TNR 
Ns 0.999 + 2.10 x 107° 0.999 + 4.41 x 107 0.999 + 4.72 x 107° 
Ne 0.999 + 4.94 x 107° 0.999 + 3.26 x 107° 0.999 + 2.60 x 107° 
Nr 0.999 + 1.25 x 107 0.999 + 4.83 x 107° 0.999 + 5.58 x 1075 
Ng 0.999 + 2.38 x 1075 0.999 + 1.11 x 1074 0.999 + 1.44 x 1074 


Table 11. Accuracies of the two-block autoencoder for Speck32/64 reduced to 5, 6, 7, 
and 8 rounds in the real-vs-random experiment. 


Distinguisher Accuracy TPR TNR 
Ns 0.999 + 1.00 x 107° 0.999 + 6.54 x 1077 0.999 + 1.53 x 107° 
Ne 0.999 + 3.68 x 1077 0.999+ 3.83 x 1077 0.999 + 4.82 x 107° 
N7 0.999 + 2.31 x 107° 0.999 + 2.76 x 107° 0.999 + 2.32 x 107° 
Ns 0.999 + 5.93 x 107° 0.999 + 6.03 x 107 0.999 + 5.99 x 107° 


Table 12. Accuracies of the three-block autoencoder for Speck32/64 reduced to 5, 6, 
7, and 8 rounds in the real-vs-random experiment. 


Distinguisher Accuracy TPR TNR 
Ns 0.889 + 1.98 x 107? 0.893 + 1.86 x 107? 0.885 + 2.20 x 107? 
Ne 0.895 + 2.30 x 107? 0.904+ 1.96 x 107? 0.886 + 2.94 x 107? 
N7 0.871 +1.11 x 107? 0.861 +3.61 x 107? 0.881 + 2.70 x 107? 
Ns 0.896 + 4.03 x 107? 0.879 + 5.69 x 107? 0.914 + 3.39 x 107? 


A prior reshaping and permutation of the input were also performed as in [7]. 
Concretely, starting from four 16-bit strings, the encoder of the one-block autoen- 
coder compressed them into four 8-bit strings, the encoder of the two-block 
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autoencoder compressed them into four 4-bit strings, and the encoder of the 
three-block autoencoder compressed them into four 2-bit strings. The results 
show that a convolutional encoder manages to learn a latent representation of 
the input that allows the convolutional decoder to reconstruct it almost per- 
fectly for the one and two-block cases and reasonably well for the three-block 
case. Having seen these results, the pretrained encoders that were trained on 
a separate training set were added as a preprocessing step to the pruned net- 
work, and training was again performed to see the effect. Performance close to 
the one we already saw (or even a slightly better one) was expected. However, 
preliminary runs show quite the contrary. Even though the pretrained one and 
two-block encoders were used as a preprocessing step (as they were the most 
promising ones), the results do not show an improvement in the performance of 
the distinguishers. What is more, not even comparable results to the ones we 
already saw for the pruned network earlier are obtained. 

For instance, for the pruned network with a one-block encoder as a preproces- 
sor, the N; distinguisher’s accuracy was 88%, and for the pruned network with 
a two-block encoder as a preprocessor, it was 82%. It seems that, even though 
the encoder managed to learn an efficient latent representation, once the input 
was transformed/engineered by the encoder, the pruned network did not have 
the complexity to decompose the engineered input and recombine it in a useful 
way. Having this intuition, the one-block encoder was added as a preprocessor to 
the original depth-10 network, and, when looking at the results of the N; distin- 
guisher, the accuracy indeed improved compared to the time when the pruned 
network was used; namely, it reached a 92% accuracy. As suspected, when adding 
an encoder to perform feature engineering, the network that does the classifica- 
tion needs indeed to be complex enough to extract useful information from the 
latent representation. 


5.2 Feature Visualization with LIME 


Next, we will look closer at the input features and their importance to try to 
gain insights into the (pruned) distinguishers’ behavior, which might aid in the 
improvement of future preprocessing methods. We will do this using one of the 
state of the art explanation techniques called Local Interpretable Model-agnostic 
Explanations (LIME) [15]. In short, according to [15], LIME explains the pre- 
dictions of any classifier in an interpretable and faithful manner by learning an 
interpretable model locally around the prediction. Using it, the feature impor- 
tances for all four distinguishers were computed, giving explanations for the 
five best predictions belonging to class 1 (fixed difference) and class 0 (random 
difference). In the associated repository, results are given just for the five best 
predictions of class 1 of the N5 and Ng distinguishers as the feature importance 
was similar. LIME with a submodular pick was also run [15]. However, even 
though it selected the instances judiciously, the results were similar to what was 
obtained so far, thus not contributing to a greater understanding of the distin- 
guishers’ behavior. From the figures given in the associated repository, we see 
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that the importance of each feature is insignificant and that it varies from distin- 
guisher to distinguisher (even from instance to instance). It seems that there is 
no clear local region that would have a considerable impact on the classification. 
Now, LIME already samples from both the vicinity of the instance and further 
away from it. Still, it is possible that even larger regions need to be considered 
for the important features to become evident to the explainer. 

Then, even though for Speck reduced to 5 rounds, the distinguisher per- 
forms quite well, the explainer suggests that removing any (even all of the 64 
features) will insignificantly affect the classifier’s performance. Having obtained 
those results even for a fairly good distinguisher and after seeing the type of 
explainer LIME currently uses, it might well be that the linear explanation 
model will not be able to explain the distinguishers’ behavior as there might 
be no linear boundary to begin with. For now, as LIME could not identify the 
important features, the conclusion is left that all of the 64 inputs are important. 


6 Conclusions and Future Work 


In this paper, the distinguisher proposed by A. Gohr [7] was under a study to 
find a better performing or smaller distinguisher for Speck32/64. To this end, 
the Lottery Ticket Hypothesis has been evaluated for the first time for the dis- 
tinguisher mentioned above, discovering that even the depth-1 version can be 
further pruned without significantly compromising the performance, empirically 
confirming the hypothesis anew. Then, based on the conclusions of prior exper- 
iments, the depth-1 network was successfully pruned to potentially aid in the 
process of explaining its behavior, besides having seen how pruning the suggested 
limit would affect the performance. 

Next, it has been studied whether a prior feature engineering would result in 
a performance gain. In the process, convolutional autoencoders of various com- 
pression capacities that successfully reconstructed the inputs were for the first 
time discovered, using their trained encoders as a preprocessor prior to training 
the pruned depth-1 network. Results have shown that even though convolutional 
autoencoders manage to learn a latent representation that they can nearly per- 
fectly decode when passing the encoded inputs to the pruned depth-1 network, 
the network’s performance decreased. This led to suspicion that the pruned net- 
work did not have the necessary complexity to extract useful information from 
the encoded inputs, which was later confirmed by additional experiments. 

As a follow-up, intending to explain the distinguisher’s behavior, the classi- 
fication explainer LIME was for the first time deployed in this setting. Results 
showed that, despite the pruned depth-1 distinguisher performing reasonably 
well, LIME considered that none of the 64 inputs impacted the classification 
outcome. This suggests that a stronger explainer than the one LIME currently 
uses is needed, suspecting two possible causes (mentioned in Sect. 5) for LIME’s 
current results that are yet to be studied. 

One direction for future work would be to train the most recent networks 
used for image recognition to see whether a better performance can be achieved. 
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Then, since there are still instances classified with high confidence as belonging to 
the opposite class, a second suggestion would be to look at ensemble learning to 
see whether it could alleviate the problem. Moreover, since the evaluation of the 
LTH revealed that the depth-1 neural distinguisher could be further pruned, it 
would be interesting to consider evaluating it for different neural distinguishers. 
Finally, combining the SAT-based algorithm [10] with the framework presented 
in [16] for extending the differential attack to more rounds would be interesting 
as well. 
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Abstract. Recent advances in the development of quantum comput- 
ers manifest the urge to initiate the transition from classic public key 
cryptography to quantum secure algorithms. Therefore, NIST has ini- 
tiated a post-quantum cryptography standardization process which is 
currently in its third and final round. One of the Key Encapsulation 
Mechanism (KEM) candidates is BIKE. In this paper we optimize the 
algorithm to achieve new speed-records for constant-time implementa- 
tions of BIKE with parameter set bikel1 on two different embedded 
architectures. For the ARM Cortex-M4 we leverage the performance 
benefit of bit-polynomial multiplication in radix-16 to outperform exist- 
ing implementations. We explore different algorithmic approaches on the 
RISC-V-based VexRiscv platform and implement parts of the standard 
RISC-V Bitmanip Extension to measure its impact on BIKE. Our results 
indicate boundaries and trade-offs between different approaches for bit- 
polynomial multiplication beyond the BIKE use-case. 


Keywords: NIST PQC Standardization - Constant-Time 
Implementation - Cortex-M4 - RISC-V - Polynomial Multiplication 


1 Introduction 


The NIST standardization process was initiated in 2017 in the face of the ongo- 
ing development in quantum computing which threatens the security of tradi- 
tional public key cryptography like RSA. Currently, the standardization is in 
its third and (presumably) final round. The set out goal is to standardize one 
or more Key Encapsulation Mechanisms (KEM) and Digital Signature Schemes 
(DSS) from the remaining 9 KEMs (4 finalists and 5 alternate candidates) and 6 
DSSs (3 finalists and 3 alternate candidates). The code-based BIKE [2] scheme 
has been selected as an alternate KEM candidate. Compared to other quantum 
secure KEM schemes (see Table7 for performance numbers) that are mostly 
lattice-based, the code-based BIKE is relatively slow. This leads to challenges in 
the deployability, especially in embedded environments where the much faster, 
quantum-insecure algorithms need to be replaced for long-term security. 
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In this paper, we tackle the performance problem by optimizing a key oper- 
ation in BIKE: polynomial multiplication in F2[z], also named bit-polynomial 
multiplication in the following, and equivalent to carry-less multiplication. Foun- 
dations to our work were laid by Brent et al. [9] in 2008, who implemented and 
improved state-of-the-art bit-polynomial multiplication algorithms and evalu- 
ated the performance for varying input sizes on the Intel Core 2 processor. 

We evaluate various algorithmic approaches for bit-polynomial multiplication 
in the context of BIKE on two widespread architectures. The ARM Cortex-M4 
is the reference platform for embedded benchmarks of PQC schemes selected 
by the NIST. Much effort has already been put into optimizing BIKE for the 
Cortex-M4. However, the implementation presented in this work outperforms 
even the previous fastest and highly optimized implementation [12] which utilizes 
the FFT. Our second evaluation platform is the RISC-V soft-core VexRiscv. 
The modular, plugin-based design allows us to directly measure the impact of 
a dedicated carry-less multiplication instruction. By implementing two different 
versions, one that is based on integer multiplication and the radix-16 form, and 
the other using a hardware accelerated carry-less multiplication instruction, we 
can explore the design space and trade-offs in terms of performance and area 
requirements. Additionally we port the optimized FFT based implementation 
for the VexRiscv to evaluate its performance against our two implementations 
using variants of Karatsuba and Toom-Cook. 


1.1 Related Work 


In 2021, Chen et al. [12] presented two optimized implementations for BIKE, 
one tailored for the Intel Haswell architecture and one for the ARM Cortex-M4. 
Besides other optimizations, they adopted two different strategies for accelerat- 
ing bit-polynomial multiplication. Their implementation for the x86 Haswell used 
the carry-less multiplication (pclmulqdq) instruction for a base multiplier. On 
top of that they used multiple stages of Karatsuba and finally one stage of Bern- 
stein’s [6] five-way recursive multiplication algorithm. Their implementation of 
the Cortex-M4, however, used a FFT based multiplication algorithm. Although 
the M4 platform does not come with the carry-less multiplication instruction, 
the question arises whether the length of polynomials in BIKE is long enough 
for the FFT based multiplication to show its complexity supremacy on the M4 
platform. 

Classic McEliece is another quantum-safe, code-based KEM scheme that was 
submitted to the NIST’s post-quantum cryptography standardization process. 
Recently, Chen and Chou [11] presented an optimized, constant-time implemen- 
tation for Classic McEliece on the Cortex-M4 platform. They showed that the 
polynomial multiplication in small finite fields, F212 and F213, can be performed 
with integer multiplication by representing data in the radix-16 representation. 
In this paper we investigate the radix-16 multiplication approach for BIKE. 


Carry-Less to BIKE Faster 835 


The open source RISC-V ISA has been utilized to develop co-processors [14] 
and accelerators [27] to speed up post-quantum cryptography. Pircher et al. [25] 
accelerated Classic McEliece with RISC-V’s vector instruction extension. Sim- 
ilar to their work, we explore RISC-V’s Instruction Set Extensions (ISE) for 
accelerating BIKE’s performance on the VexRiscv core. 


1.2 Contribution 
To summarize, in this work we make multiple contributions. 


— We develop a constant-time implementation for BIKE on the Cortex-M4 
which outperforms Chen et al. [12] by accelerating the bit-polynomial mul- 
tiplication. We present an optimized multiplication based on integer multi- 
plication and the radix-16 format. We show how to perform shift operations 
in the radix-16 format efficiently, to reduce the overhead of transformations 
during the multiplications. We profile different variants of Karatsuba and 
Toom-Cook to determine the optimal algorithm and implementation for the 
given setting. 

— We provide the fastest constant-time implementation for BIKE for the Vex- 
Riscv, and with this the first version optimized for the RISC-V ISA. Our 
fastest implementation is more than three times faster than the portable 
implementation provided by the BIKE team. We evaluate three different vari- 
ants: one based on the radix-16 format, one based on additional instructions 
and supplementary we port the FFT implementation by Chen et al. for the 
M4 to the RISC-V ISA. 

— We extend the VexRiscv with carry-less multiplication and conditional move 
instructions as proposed by the standard RISC-V ISA extension for bitma- 
nipulation to evaluate the performance impact of the ISE for BIKE. Our 
carry-less multiplication implementation is parameterized by a window size 
and thus allows a finer choice for the area-performance trade-off. 


2 BIKE 


In this section we briefly recap the specification of BIKE [2], a code-based key 
encapsulation mechanism, in the version it was submitted to the third round of 
NIST’s Post-Quantum Cryptography standardization process. 

Figure 1 depicts BIKE’s algorithms for key generation, encapsulation and 
decapsulation. A secret key (ho, hi,o), with (ho, hi) € R? and |ho| = |hi| = w/2 
together with a public key h is generated by the key generation algorithm. With 
the public key h as input, the encapsulation algorithm outputs a session key 
K and a ciphertext c where the session key is encapsulated. The decapsulation 
algorithm on the other hand takes the secret key (ho, h1, o) and the ciphertext 
c as inputs and either generates the session key K or outputs L. 

In the encapsulation and decapsulation, three different hash functions based 
on Keccak are used: H, K and L with the following domains and ranges. 
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Encaps: h> K,c 

KeyGen: () © (ho, h1,0),h Input: hE R 

Output: (ho, hi,0) € Hw x M,hE R |Output: K Ee K,ce Rx M 

1: ho, ha È Hu lm ÈM 

2: h = hı -hgt 2: e9,€1 — H(m) 

302M 3: c= (eo +e1-h,m @ L(eo, €1)) 
4: K — K(m,c) 


Decaps: (ho, hi,7),cte K 
Input: ((ho,hi),7) E€ Hw x M,c= (co,c1) ER x M 


Output: K EK 
1: e' — decoder(coho, ho, h1) pe ER? U{L} 
2: m =c1 6 L(e’) > with the convention L= (0,0) 


3: if e' = H(m’) then K — K(m’,c), else K — K(o,c) 


Fig. 1. BIKE’s key generation, encapsulation, and decapsulation. 


- H: {0,1 ¥ — {0, 1} 

— K: {0,1} +% — {0,1}, 

- L: {0, 1}?r a {0, Ly, 

The computation time spent during the encapsulation is mostly determined by 
the costs for the multiplication of e,-h in R. The inversion of hg during the key 
generation invokes several multiplications in R, so does the decoding algorithm 
decoder that is used during the decapsulation. 

A multiplication in the ring R = F2[x]/(x” — 1) consists of a polynomial mul- 
tiplication in F2[z], i.e., bit-polynomial multiplication and a reduction modulo 
the irreducible polynomial z” — 1. Since the reduction simply shifts the terms of 
degree > r to the lower degree part, the bit-polynomial multiplication dominates 
the computing time of multiplications in R. Altogether one can observe that the 
multiplication is an essential factor for the performance of BIKE and thus the 
focus of our optimizations in this paper. 

For a detailed specification and security analysis of BIKE we refer the reader 
to the BIKE’s team official communication website [2]. 


System Parameters. BIKE can be instantiated with three different parameter 
sets, each fulfilling a different security level as proposed by the NIST. Table 1 lists 
the three sets, denoted as bikel1, bike13, and bikel5 and their corresponding 
parameters r,w,t, and £. Parameter r determines the size of the polynomials 
and is therefore the key parameter for our optimizations. 


3 Evaluation Platforms 


For this work we are using two different evaluation platforms. The first is the 
STM32£4-Discovery development board featuring an ARM Cortex-M4 proces- 
sor. The second platform is the VexRiscv soft-core, that we used due to its 
modular design. 
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Table 1. Parameter Sets of BIKE. 


Parameter Set r w t £ 
bikel1 12323 142 134 256 
bikel3 24659 206 199 256 
bikel5 40973 274 264 256 


3.1 The ARM Cortex-M4 


Our first evaluation target is the ARM Cortex-M4, a 32-bit RISC processor that 
was already used as a representative for microcontrollers in numerous publica- 
tions about efficient implementations of post-quantum cryptography [18]. 

We choose the STM32£4-Discovery development board as our working plat- 
form, identical to the pqm4 benchmarking project [17]. The STM32£4-Discovery 
board comes with the STM32F407VGT6 microcontroller, including the ARM 
Cortex-M4 with the floating-point unit. The microcontroller has 192-KB of 
SRAM and 1-MB of flash memory and can be clocked with up to 168 MHz. 

The M4 implements the ARMv7E-M ISA and provides the programmer with 
thirteen 32-bit general-purpose registers. A further register can be used for com- 
putations when the content of the link register is preserved on the stack. 

In this work, we use the multiply-and-accumulate instruction (umlal) exceed- 
ingly. The instruction ‘umlal rdLo, rdHi, rn, rm’ multiplies two 32-bit inte- 
gers in rn and rm and then adds the 64-bit product to the values of two 32-bit 
registers (rdHi,rdLo) in one clock cycle. Important to note is that many of 
the instructions on the M4 allow to shift one of the operands without latency 
overhead. 

Single 32-bit memory access usually takes 2 cycles. However, loading n 32-bit 
words can be as fast as n+ 1 cycles, when consecutive loads can be pipelined and 
no cache misses occur. Storing n 32-bit words takes only n cycles in the best case. 
The optional floating-point unit comes with 32 additional 32-bit registers that 
can also be used to store intermediate values. With a latency of one cycle to move 
data between a general-purpose and a floating-point register, the floating-point 
registers exhibit a faster access time than the RAM. 

We use the pqm4 [17] framework in version 6435b29 to benchmark our imple- 
mentation. It is important to note that the pqm4 framework benchmarks imple- 
mentations at 24 MHz to have zero wait states when accessing code or data in 
the flash memory. Our code is compiled with the arm-noneeabi-gcc-10.2.1 
compiler. 


3.2 The VexRiscv 


VexRiscv [24] is a modular 32-bit RISC-V processor that features a 5-stage 
pipeline with bypassable execute- and memory stage, i.e. results may be for- 
warded through the pipeline if an instruction does not utilize the respective 
stage. VexRiscv is written in the hardware description language SpinalHDL [23] 
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and features a versatile plugin system which allows defining custom CPU exten- 
sions. Similar to the Cortex-M4, most instructions are executed in one cycle. We 
use the default GenFull configuration of VexRiscv that implements the RISC-V 
base integer instructions (I) combined with the standard RISC-V Instruction Set 
Extensions (ISE) for integer multiplication and division (M) and atomic instruc- 
tions (A). GenFull features a dynamic target branch predictor with 4096 kB of 
cache memory for data and instructions respectively. For the VexRiscv we com- 
pile our code with clang version 13.0.0. 

RISC-V is an open and modular ISA with a small set of base integer instruc- 
tions (RV32I/ RV64I) that can be expanded with standard ISA extensions or 
non-standard custom extensions. The RISC-V ISA provides 31 general purpose 
registers for the programmer (x1-x31). 

The RISC-V Bitmanip (B) extension [3] is close to becoming ratified and 
includes many instructions that will be useful for cryptographic implementations. 
Besides advanced logic and bit permutation instructions it includes a conditional 
move instruction (cmov) that allows a branchless, constant-time selection of a 
value without arithmetic detours. One subset of the bit-manipulation extension 
is the Zbc extension that defines carry-less multiplications. Equivalent to the 
integer multiplication instructions, there is one instruction for computing the 
lower half of the product (clmu1) and one for the upper half (clmulh). Carry- 
less multiplication corresponds to the multiplication of polynomials in F2[z], also 
named bit-polynomial multiplication in this paper. The pseudocode is shown in 
Fig. 2. 


u32 clmul(u32 rsl, u32 rs2){ u32 clmulh (u32 rs1, u32 rs2){ 


u32 x = 0; u32 x = 0; 
for (int i=0; i<32; i++){ for (int i=1; i<32; i++){ 
if ((rs2 >> i) & 1) if ((rs2 >> i) & 1) 
x “= rsl << i; x “= rsl >> (32-1); 
} } 
return x; return x; 
} } 


Fig. 2. Pseudocode for the clmul (left) and clmu1h (right) instructions. 


Implementing clmul and cmov. We implement the aforementioned instruc- 
tions to VexRiscv using the plugin system. Our implementation of clmul features 
a configurable window size that defines how many bits of the input registers are 
processed in each clock cycle. This allows us to explore trade-offs between area 
consumption and execution performance later on. Since the overall input size 
is 32 bits, we allow window sizes of w = 2' with i € {2,...,5}. The instruction 
therefore has a latency of 32/w clock cycles in the execution stage. During that 
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time, the execution stage is blocked and cannot be used by other instructions 
(VexRiscv does not support SMT). 

The cmov instruction is computationally much less complex then clmul, how- 
ever, the instruction format requires an additional register encoded to the op- 
code of the instruction. Therefore, the VexRiscv RegFile-plugin needs to be mod- 
ified and the decoding position of rs3 within the op-code needs to be defined 
in the RISC-V ISA definition. The cmov-plugin itself is fairly simple as it sets 
the target register rd to rs1 or rs3 depending on the value of rs2. The logic is 
implemented in the execute stage. 

We use the cmov instruction as a counterpart to the sel of the Cortex-M4 
for the Barrel shifter that is used in the decoder in BIKE (See Chap. 4 in [11]). 
Without the cmov, a constant-time arithmetic solution requires three instructions 
instead. 

The RISC-V bit-manipulation extension in general induces a comparatively 
large area overhead as reported in the draft standardization [3, Fig. 3.1]. Table 2 
shows the area utilization of our ISA extensions for several choices of w on a 
Xilinx Artix-7 FPGA (xc7a200tfbg484-3) using Vivado 2021.1 and the area- 
optimized synthesis and implementation strategies. The clmul-w configura- 
tions also include the cmov extension while the original version is built using 
the unmodified GenFull configuration without any plugins. The FF utilization 
decreases with increasing w since the less intermediate values need to be buffered 
for large window sizes. All designs could be placed with 160 MHz (+5 MHz). 


Table 2. Vexriscv area footprint of the hardware implementation for the conditional 
move instruction and carryless multiply with different window sizes (clmul-w). The 
clmul-w configurations also include cmov. 


Original cmov clmul-4 clmul-8 clmul-16 clmul-32 
LUT 1831 1954 2140 2169 2306 2459 
FF 1634 1675 1747 1746 1737 1669 


4 Bit-Polynomial Multiplication 


Depending on the size of the polynomial (a degree n — 1 polynomial in F [2] has 
the size of n bits) the fastest algorithm for a multiplication varies. Depending 
on the architecture, the base multiplication (e.g. word size) is either done with 
a dedicated instruction or the window method. This base multiplication can be 
lifted by a divide-and-conquer algorithm to an intermediate size. For polynomials 
of very large size, an FFT based approach offers the fastest asymptotic runtime. 

Brent et al. [9] examined the performance of bit-polynomial multiplication 
algorithms on the Intel Core 2 processor. In particular, they report that the FFT 
based approach first outperforms the Karatsuba/Toom-Cook algorithms at a 
polynomial size of 2461 x 64-bit words (157504 bits). They report 3295 x 64-bit 
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words (210880 bits) as the biggest size where a Karatsuba/Toom-Cook vari- 
ant runs faster than the FFT based multiplications. However their results are 
not directly transferable to other architectures, as the available instructions dif- 
fer, which can lead to clearly different costs for the base multiplications. For 
example Brent et al. use Intel’s SSE extension with 128-bit SIMD instructions, 
whereas the embedded platforms we use in this work, natively only offer 32-bit 
instructions. Game-changing instructions like a clmul on the other hand lower 
the cost of the base multiplication considerably. The different costs for the base 
multiplication for Karatsuba/Toom-Cook algorithms can considerably move the 
threshold where FFT based approaches become faster. 


4.1 Single-Word Polynomial Multiplication 


Multiplication of polynomials with the size of a word (32-bit for the processors 
we used in this work) is most efficient with a dedicated instruction. In 2008 Intel 
introduced [15] an instruction for carry-less multiplication to accelerate crypto- 
graphic computations. Such an instruction is also available on other architectures 
like PowerPC, Sparc or ARMv8. 

However, embedded processors like the Cortex-M4 are usually not equipped 
with such an instruction. In this case the window method [9] is the fastest known 
algorithm. To compute the multiplication of two polynomials c = a-b the window 
algorithm, parameterized with the window size s, precomputes in the first step a 
table with 2° entries. The table consists of the multiples of b by all polynomials 
of degree smaller than s. In the next step the algorithm iterates over a looking 
at s bits at a time and composes the product c out of the precomputed values. 
However, we note this method is not suitable for architectures with data cache 
since it queries the table with values from the input operand, which is vulnerable 
to cache-timing side-channel attacks [5]. 

The window algorithm is used in the portable implementation, we also used 
it in our hardware implementation of the carry-less multiplication instruction 
for RISC-V. 


4.2 Multiplication for Intermediate-Sized Polynomials 


For polynomials of medium size it is beneficial to apply a divide-and-conquer 
algorithm that divides the polynomial in smaller parts. 


Karatsuba. The well known Karatsuba [19] algorithm breaks down the mul- 
tiplication of two polynomials into three half-size multiplications and can be 
recursively applied until the size of the base multiplication (usually word size) 
is reached. 

To multiply two polynomials F = Fo + Ft” and G = Go + Gt” of degree 
2n, the Karatsuba algorithm reduces the task into three multiplications with 
polynomials of degree n. After multiplying FoGo, (Fo + Fi)(Go + Gi) and FG 
the product FG can be determined by computing. 


FoGo + ((Fo + Fi)(Go + Gi) — FoGo — Fi G1 )t” + FiGit’” 
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Recursively applied, the Karatsuba algorithm requires O(n!°823) base 
multiplications. 


Karatsuba Variants. Weimerskirch and Paar [30] provided a generalization of 
Karatsuba for polynomial multiplication of arbitrary degree and recursive use. 
In 2005, Montgomery [21] developed Karatsuba variants for multiplication of 
polynomials with five, six and seven terms k in contrast to the two terms of the 
classic Karatsuba algorithm. 

In 2009, Bernstein [6] improved the Karatsuba algorithm for k = 2 by reduc- 
ing the number of required additions and named it the refined Karatsuba: 


(1 — t”)(FoGo — FiGit”) + (Fo + Fi)(Go + Gi)t”. 


Speaking in terms of compiler optimizations, this is basically an improvement by 
a common subexpression elimination. In the same paper Bernstein introduced 
an optimized recursive multiplication algorithm for polynomials with k = 3. 
Bernstein’s five-way algorithm [6] computes the product H = FG of two 
polynomials Fy + Fix” + Fox?” and Go + Gir” + Gor?” of degree 3n by using 


U +V + H(œ)(xt +2) (02 4 


r? +r 


H =U + H(o)(2™ +2") 4 x") 


with 


U = H(0) + (A(0) + H(1))z and V = A(x) + (A(x) + H(z + 1))(x” + x). 


It requires five multiplications of polynomials of degree n: 


H(0) 
H(1) = (Fo + Fi + F2): (Go + Gi + Ga), 
H(z) = (Fo + Fix + Foz”) - (Go + Gig + Goa”), 
)=( 
) 


Toom-Cook. The Toom-Cook [13] algorithm is a generalization of the Karat- 
suba algorithm and is modular in the number of terms k the polynomials are 
divided into. With k = 2 the Toom-Cook algorithm corresponds to Karatsuba, 
with e.g. k = 3 the Toom-Cook algorithm divides the input polynomials into 
three terms and reduces the number of smaller multiplications from nine to five, 
compared to the schoolbook method. This leads to an asymptotic runtime of 
O(n!°835). With growing parameter k the overhead of the algorithm becomes 
bigger and thus limits its usage. In practice [7,9], Toom-Cook with k = 2 (TC2), 
k = 3 (TC3) and k = 4 (TC4) are used for efficient intermediate size multipli- 
cation. Bernstein’s five-way algorithm corresponds to Toom-Cook with k = 3. 
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The usage of Toom-Cook multiplication in F2[z] comes with special charac- 
teristics. TC3 for example requires five evaluation points, but F [a] only offers 
the two elements 0, 1 and the point oo. However, it is possible to use any power 
of the variable x as an additional evaluation point [9]. The disadvantage of this 
method is that the size of the polynomials for some of the submultiplications 
increases slightly. This makes an implementation more complicated for example 
compared to the classic Karatsuba, where a polynomial with a size n = 2! can be 
neatly divided into l (recursive) layers. Bodrato [7] showed that any Toom-Cook 
algorithm in F2[z] with k > 2 requires at least one division. For smaller polyno- 
mial sizes, this for example can make Montgomery’s Karatsuba variants - that 
do not require a division - faster, despite the slower asymptotic runtime. Brent 
et al. provide a word aligned variant of Toom-Cook with k = 3 (TC3W) that 
uses 0,1,2”,2~” and oo as evaluation points (with word-size w). The advantage 
of this variant is that all suboperations including divisions operate at word-size 
granularity and thus, no bitshifts are required. This is especially beneficial for 
our radix-16 approach as explained in the next chapter. 


4.3 Multiplication for Large Polynomials 


In 1971 Schénhage and Strassen [28] demonstrated how to multiply large inte- 
gers in O(nlognloglogn) by using the FFT. Cantor and Kaltofen [10] later 
generalized this method to arbitrary polynomials. To perform a multiplication 
based on a FFT, one first transforms the two input polynomials into the FFT 
domain, computes the actual multiplication in the FFT domain, and transforms 
the product back to the original domain for the result by an inverse FFT. For 
multiplications that share an operand, the shared polynomial is kept in the 
FFT domain. This reduces the number of necessary transformations and thus 
the costs. The reduction step during a multiplication in the ring F2[a]/(«” — 1) 
requires the back transformation of the bit-polynomial multiplication result, even 
for consecutive dependent multiplications. 

Chen et al. [12] used a Frobenius Additive FFT (FAFFT) for the bit- 
polynomial multiplication in BIKE on the Arm Cortex-M4 platform. We ported 
their implementation to the RISC-V architecture for comparisons. 


5 Bit-Polynomial Multiplication in the Radix-16 
Representation 


Bit-Polynomial Multiplication via Integer Multiplication. An uncom- 
mon option to implement bit-polynomial multiplication uses integer multiplica- 
tion in combination with data in a radix-16 representation. With the radix-16 
representation, one expresses a degree-7 polynomial a = a aix? € Fo[z] as a 
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32-bit integer ap + a12 + a228 + --- + a72?8. Multiplying polynomials a- b > c 
in this form with integer multiplication yields: 
(ao + ay + 24 + az -28 + --- + ar - 28) - (bo + bi 0" + bo - 28 +- - + by- 278} 
= aobo + (aibo + aob1) è 24 t (aabo t a by t aob2) : 28 iia (a7b7) : 596 


an integer where the bit of index 4i is exactly c;, and thus after masking out 
the other indices remains c in radix-16 representation. Chen and Chou [11] pre- 
sented the multiplication in radix-16 formats and applied it to multiplication in 
F212 and F213, i.e., polynomials of 12 and 13 bits. In this work, we present the 
techniques for extending the method to bit-polynomial multiplication in BIKE, 
including the optimization of 32-bit base multiplication, data conversion, logic 
shift operation, and building multiplications for polynomials of various sizes in 
the following. 


Table 3. Performing an 8-bit bit-polynomial multiplication with 32-bit integer multi- 
plication in radix-16 form. 


00010001000100010001000100010001 x 00010001000100010001000100010001 
0001 0001 0001 0001 0001 0001 0001 0001 
0001 0001 0001 0001 0001 0001 0001 0001 
0001 0001 0001 0001 0001 0001 0001 0001 
0001 0001 0001 0001 0001 0001 0001 0001 
0001 0001 0001 0001 0001 0001 0001 0001 
0001 0001 0001 0001 0001 0001 0001 0001 
0001 0001 0001 0001 0001 0001 0001 0001 
0001 0001 0001 0001 0001 0001 0001 0001 
0000 0001 0010 0011 0100 0101 0110 0111 1000 0111 0110 0101 0100 0011 0010 0001 


Base Multiplication in Radix-16 Form. Table 3 shows an example of an 8- 
bit bit-polynomial multiplication with a 32-bit integer multiplication using the 
radix-16 form. Even during a multiplication of two radix-16 values with all bits 
set, the carry bits do not propagate. Furthermore this example demonstrates 
that the lower half of the multiplication result can be added to the higher half 
(both residing in a 32-bit register) without a prior reduction. One can see that 
the pairwise sum of each nibble is capped at eight and therefore does not lead 
to a carry propagation to the next nibble. This observation allows the use of the 
powerful multiply-with-accumulate instruction (umlal) of the Cortex-M4 during 
the combination of 16 8-bit bit-polynomial multiplications to perform one 32-bit 
multiplication. 

The radix-16 format quadruples the size of a polynomial during computation, 
to avoid this memory overhead when storing polynomials, we pack four bytes in 
radix-16 format in one register by shifting byte i, i bits to the left. 
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To perform one 32-bit bit-polynomial multiplication we therefore extract four 
bytes for each operand, perform 16 integer multiplications with reductions in 
between and pack the result in two 32-bit registers. For the Cortex-M4 we are 
able to express this operation with only 46 instructions, by using the umlal 
instruction as explained before and the barrel-shifter. For our RISC-V imple- 
mentation we need 89 instructions for the same operation, because not only of 
the missing barrel-shifter and multiply-with-accumulate instruction, but also are 
multiplications of 32-bit values expressed with two instructions, one for the lower 
half and one for the upper half of the result. 


Data Conversion for the Radix-16 Representation. Similar to a polyno- 
mial multiplication via a FFT, the input and output polynomials have to be 
transformed to and from the radix-16 representation for the multiplication. 

Since an 8-bit polynomial is stored in a 32-bit register in the radix-16 repre- 
sentation, a straight-forward approach would increase the memory footprint by 
factor four. Even more importantly, it would multiply the number of memory 
instructions for a bit-polynomial multiplication by four. One can however store 
four 8-bit polynomials in radix-16 representation in one 32-bit register. 

Figure 3 shows the steps of data movement in a 32-bit register for converting 
data to the radix-16 representation. The leftmost table shows the original data, 
and the fields represent register bits ordered from right to left and top to bottom. 
The converting method is modified from the matrix transpose algorithm. In each 
step, we swap the data in blue with the data in green. The first step swaps two 
2 x 4 matrices. The second step swaps off-diagonal 2 x 2 matrices in two 4 x 4 
matrices. The last step swaps off-diagonal elements in all 2 x 2 matrices. In the 
rightmost table, the data has been split into 4 lanes (columns). The original data 
in bits 0 to 7 move to the first lane, bits 8 to 15 move to the third lane, and 
so on. 


TETEJE 1] 0 [Mls Mio 24] 8] 16] 0 

LH 17 | 16 25 (B17 25/9/17] 1 
11/10] 9] 8 11 | 10 | | 10 | | 2 26 |10| 18 | 2 

ae 27 | 26 = (2 | 19 B 27 |11| 19 | 3 
5 | 4 | al - 28 |12| 20 | 4 

23 | 22 | 21 | 20 $k 21 | 20 29 B2 Ba 29 |13| 21 | 5 

1 | ie til 15 | 14 | | 14 6 30 14] 22 | 6 
31 | 30 | 29 | 28 31 | 30 31 |G 23 ay 31 |15| 23 | 7 


Fig. 3. Swap steps for converting 32-bit data to the radix-16 representation. 


Logic Shift for the Radix-16 Data. Analog to the FFT based bit-polynomial 
multiplication, the multiplication with the radix-16 form becomes more efficient 
when intermediate results of consecutive multiplications remain in the radix-16 
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representation. The divide-and-conquer algorithms that lift our base multiplica- 
tion to the size that is required for BIKE introduce additional operations during 
the smaller multiplications however. 

Some operations, e.g. addition in F2[z] that correspond to a xor operation, 
can be applied to the polynomials in radix-16 representation without any adapta- 
tion. But other operations, in particular logic shifting, require a radix-16 specific 
implementation that can be costly. 

Figure 4 shows an example of a shift-left-by-1 for data in the radix-16 repre- 
sentation. In the figure, bits in registers are ordered from right to left and top 
to bottom. Four lanes in a 32-bit register are shown as four rows. We use the 
multiply-with-accumulate instruction (umlal) of the Cortex-M4 to perform the 
shift operations for radix-16 data. For shifting left by 7, we set one operand to 
be the constant 2*4 and the other operand is the data to be shifted. After the 
umlal instruction, the register holding the lower part of the product contains the 
almost shifted result. Some data has been moved to the higher part of the prod- 
uct, and hence we spend extra operations to move them to the correct positions. 
While applying the same idea for the shift-right operation, we set the constant 
to be 2°2-4* and collect the main part of the shift resulting in the higher part 
of the product. 


24| 8 | 16] 0 31 1145 ral E] El Ba | x 
25} 9 | 17) 1 24| 8 | 16] 0 
26 | 10] 18 | 2 yc gt uma f : n | 25} 9 | 17) 1 

2 


xX |) xX | X |X] | 26] 10} 18 


31 | 15 | 23 | 7 


Fig. 4. Shift left by 1 in radix-16 representation. 


Lifting the 32-Bit Base-Multiplication with Karatsuba and Toom- 
Cook. The portable implementation [2] of the BIKE team uses the refined 
Karatsuba recursively to lift its base multiplication to 16384 bits for bikell, 
however only 12323-bit polynomial multiplication is required by bikell. The 
optimized BIKE implementation for the x86 by Chen et al. [12] improves this 
algorithmic approach by using one layer of Bernstein’s five-way Karatsuba at the 
top: the base multiplication is lifted to 4096-bit with the same refined Karat- 
suba and then combined to 12288 bits with the five-way Karatsuba. With a small 
overhead they further lift it to 12350 bits. 

For our implementations we first followed the algorithmic approach by Chen 
et al. and combined it with radix-16 specific operations in F2[z] that allows 
us to keep the subpolynomials in the radix-16 form for the entire multiplica- 
tion, except for the one division in Bernstein’s five-way Karatsuba. As already 
mentioned in Sect. 4, the word-aligned Toom-Cook 3 (TC3W) algorithm that 
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divides a polynomial in three terms and requires five submultiplications just as 
Bernstein’s five-way Karatsuba, does not require radix-16 specific operations. 
This is, because all operations in this algorithm operate on word-size granular- 
ity at which the radix-16 format does not differ from the normal representation 
of bit-polynomials. The drawback of this algorithm is the small increase of the 
size of the polynomials for the submultiplications. Beyond that we tested the 
usefulness of TC4 for bit-polynomial multiplication with sizes relevant for BIKE 
on our embedded platforms. 

We also adapted the gf2x library [8] by Brenet et al. for bare-metal applica- 
tion on our evaluation platforms, it includes implementations of TC3, TC3W, 
TC4 and Karatsuba (including variants by Weimerskirch et al. and Montgomery) 
optimized for F2[x]. Based on nine pretuned base multiplications for polyno- 
mials of the size from one to nine words (32-bit) that we partially optimized 
in assembly, we benchmarked the cycles for each Toom-Cook variant and the 
refined Karatsuba for all polynomial sizes, up to the size required by BIKE. The 
optimal algorithm for each size is fed back to the implementation. The testing 
showed that TC3W slightly outperforms Bernstein’s five-way Karatsuba for our 
case and is the optimal choice for the first layers until the refined Karatsuba is 
used. TC3W is first used for an input size of 19 x 32-bit words (608 bits), the 
refined Karatsuba is latest used for a size of 129 x 32-bit words (4128 bits). TC4 
did not show better results for our specific setting. 


6 Evaluation 


Figure 5 shows the profiling results of the portable implementation of BIKE 
on the VexRiscv, which clearly indicate the importance of the bit-polynomial 
multiplication for BIKE. In this implementation the base multiplication function 
contributes 36.41% of the overall computation time of the KEM, and more than 
half of the time is spent in the full-size multiplication. 


6.1 Comparisons of Multipliers 


To directly compare the different approaches for bit-polynomial multiplication 
for BIKE, we measure the cycles spent on the Cortex-M4 for one multiplication 
including all transformations if necessary. Table 4 shows our measurements com- 
paring this work with the portable implementation [2] of the BIKE team and 
the FFT based implementation by Chen et al. [12]. Recall that in the bikel1 
parameter set, polynomials are 12323 bit long and in the bike13 parameter set 
24659 bit. 

The portable implementation is based on Karatsuba and the window algo- 
rithm as its building block in C. The FFT based approach considerably outper- 
forms the portable version for the two parameter sets bikel1 and bikel3 and 
is more than two times faster on the M4. Yet, our approach based on the radix-16 
representation has an about 23% smaller cycle count than the FFT approach for 
bikel1. For bike13 on the other hand, the asymptotic smaller runtime causes 
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Fig. 5. Profiling of the portable implementation of BIKE on the VexRiscv using [4]. 
Functions with less than 2% impact on the overall execution time are omitted. 


the FFT based multiplication to be slightly faster. These findings eliminate the 
radix-16 approach for any other parameter set than bikel1. For the further 
evaluation we thus concentrate on bikel1 only. A relevant conclusion of these 
experiments is also that the boundary that defines whether the Karatsuba and 
Toom-Cook approach or the FFT approach is most efficient is at around 24659 
bits on the Cortex-M4. 

Important to note is the impact of the transformations that is small for 
radix-16, but very distinct for the FFT approach. Chen et al. report a reduc- 
tion of about 30% in cycle consumption for one multiplication when one input 
transformation is omitted. In this case the FFT multiplication is faster than our 
approach, which becomes apparent at BIKE’s decoder. 

For the VexRiscv we see in Table4 that the Instruction Set Extension 
(ISE) allows an implementation that clearly outperforms all other multiplica- 
tion approaches and is more than seven times faster than the portable version 
provided by the BIKE team, even in the configuration with the smallest area 
footprint. 

Interesting is the performance of the radix-16 implementation on the 
VexRiscv: Different to the Cortex-M4, the multiplication in radix-16 is not faster 
than the FFT based approach. The main reason for this are the different integer 
multiplication instructions. On the Cortex-M4 we can add a value to a 64-bit 
multiplication result in one cycle, on the VexRiscv we need two instructions 
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Table 4. Cycle counts for one multiplication in R on the Cortex-M4 on the left and 
the VexRiscv on the right. 


Mul. Imple. 

749287 ISE (w=4 

662963 ISE (w=8 

620479 ISE (w=16 

bikel1l 598744 ISE (w=32 
2 224 293 Radix-16 
1756734 FFT [12] 
5313565 portable [2] 


Mul. Imple. 

1019544 Radix-16 

bikel1 1320940 FFT [12] 
2897887 portable [2] 

2937 113 Radix-16 

bikel3 2929293 FFT [12] 
9606051 portable [2] 


alone to compute the lower and higher half of the multiplication, each of which 
consumes one cycle. This small instruction difference in the two architectures 
causes the disparity in the algorithmic performance. 


6.2 Performance of BIKE KEM 


Measuring the three KEM operations for BIKE demonstrates that our radix-16 
multiplication approach beats the current fastest implementations of BIKE on 
the Cortex-M4 (Table 5). For the bikel1 parameter set we observe an improve- 
ment in cycles of about 13% for the key generation and about 7% for the encap- 
sulation. The decapsulation is about 3% slower than the FFT based approach. 
This is due to the many multiplications in the decoder where input transforma- 
tions can be omitted and thus the FFT based multiplications are faster. For an 
implementation where code size is insignificant, one could use the FFT based 
multiplication for the decoder and our radix-16 multiplication in the key gener- 
ation and encapsulation to achieve the best overall performance. 


Table 5. Cycle counts for BIKE on the Cortex-M4. 


Key Gen Encaps Decaps Imple. 
21137291 2989187 50832769 Radix-16 
bikel1 24935033 3253379 49911673 FFT [12] 
65414337 4824059 114592442 portable [2] 


For the VexRiscv we provide three different implementations, one is based on 
the multiplication in radix-16 representation and derived from our Cortex-M4 
implementation. The second implementation is a portation of the FFT based 
approach from Chen et al., and enables us to compare the performance of the 
different algorithmic foundations also on the VexRiscv. The third implementa- 
tion uses the clmul and cmov instruction, that are included in the RISC-V ISA 
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extension B. The distinct performance advantage of the additional instruction 
comes with higher costs in hardware area though. As described in Sect. 3.2, our 
clmul implementation can be configured with different window sizes w to allow 
a tradeoff, between area and performance. 

As expected, the FFT based approach is much faster than the portable 
implementation on the VexRiscv, the key generation is about 58%, the encapsu- 
lation 44% and the decapsulation 48% faster, as shown in Table 6. The multipli- 
cation in radix-16 can not compete with the multiplication based on the FFT as 
shown in Sect. 6.1, and the corresponding BIKE implementation behaves accord- 
ingly. 

When the clmul and cmov instructions are added to the VexRiscv, the tables 
clearly turn again. Our implementation with the additional instruction com- 
pared to the one based on the FFT saves more than 46% during key generation 
and about 19% during encapsulation and decapsulation, in the smallest setting. 
More than 2.5 million cycles can be saved in the key generation alone, when 
the implementation of the clmul instruction is switched from a window size of 
4, to a window size of 32. We also measured the impact of both instructions 
individually, the cmov instruction alone is responsible for a reduction of about 
ten million cycles in the decapsulation. 


Table 6. Cycle counts for BIKE on the VexRiscv. 


Key Gen Encaps Decaps Imple. 
18189581 4234986 72917099 ISE (w=4 

16 726630 4149304 71623427 ISE (w=8 

15993 482 4105239 70978980 ISE (w=16 

bikel1l 15627784 4085334 70659901 ISE (w=32 
46 301780 6396252 107814753 Radix-16 

34275 830 5264852 88810190 FFT [12] 
111916192 9483762 160643740 portable [2] 


Roe Rat Re NE 


6.3 Comparison with Other NIST Post-quantum Candidates 


The pqm4 [17] includes implementations for most of the KEMs that are candi- 
dates in the third round of the post-quantum standardization process by NIST. 
We are not aware of any implementation of the code-based KEM HQC [20] 
for the Cortex-M4. For Classic McEliece [1], the third code-based KEM that 
is still a candidate in the standardization process, an optimized constant-time 
implementation for the Cortex-M4 exists [11]. 

Despite our optimizations, BIKE is outperformed by KEMs based on ideal 
lattices on the Cortex-M4, similar to other platforms. BIKE is in most cases 
clearly faster than FrodoKEM [22] (based on standard lattices) and SIKE [16] 
(based on isogenies), as the numbers in Table 7 show. Compared to the numbers 
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reported by Chen and Chou [11], the key generation in our BIKE implemen- 
tation is more than 73 times faster than the one in McEliece. However, the 
encapsulation is about 6 times, and the decapsulation about 2 times slower. 

A RISC-V counterpart to the pqm4 framework, called pqriscv is available 
on github, but only as work-in-progress. By the time of writing this paper, there 
are no post-quantum implementations included that could be used to compare 
the performance with our work. 


Table 7. Cycle counts of the 3rd-round KEMs on the Cortex-M4. All numbers are 
from pqm4 [17] commit-6841a6b (ecxl. bikel1 and mceliece348864). 


Scheme (Implementation) Level Key Gen Encaps Decaps 
bikel1 (this work, radix-16) 1 21137291 2989187 50832769 
mceliece348864 (m4f) [11] 1 589 600 267 482 594 2 291 003 


48 348105 47130922 46594383 
463 343 566 744 525 141 
763 979 923 856 862 176 
361 687 513 581 498 590 
654 407 862 856 835 122 

79 658 656 564 411 537 473 

143 734 184 821524 815 516 


48 264129 78911465 84276911 
119 480622 219632058 221029700 


frodokem640aes (m4) 
kyber512 (m4) 
kyber768 (m4) 
lightsaber (m4f) 
saber (m4f) 
ntruhps2048509 (m4f) 
ntruhps2048677 (m4f) 


sikep434 (m4) 
sikep610 (m4) 


w elw ePl|w lw Ble |e 


7 Conclusion 


Besides providing the fastest implementation of BIKE for two architectures, this 
work presents an interesting case study about bit-polynomial multiplication. Our 
measurements underline that besides the size of the polynomial, also the archi- 
tecture and the algorithmic embedding of the multiplication are an important 
factor to be considered in the pursuit of optimal performance. Not only game- 
changing instruction like a carry-less multiplication instruction can make the 
difference, but also a two-part integer multiplication instruction versus a one- 
cycle multiply-and-accumulate instruction. The small performance penalty of 
our Cortex-M4 implementation in the decoder of BIKE against the FFT based 
approach highlights the impact of the algorithmic embedding. 

Furthermore, we present an informative example of how RISC-V’s adaptabil- 
ity allows variable solutions for cryptographic implementations. 

While our implementation is safe against timing-side-channels by adhering to 
the constant-time policy, it is not hardened against power-side-channel attacks. 
A recent work [29] introduces a possible power-side-channel attack against BIKE 
and exploits the arithmetic move. The application of cmov instructions instead 
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of arithmetic moves should reduce the leakage, but we did not validate this as 
it is out of scope for this paper. Of course, this simple modification alone is 
far from a full power-side-channel protection, which is an interesting target for 
future work. 


Acknowledgements. Some of this work was done while Ming-Shing Chen was work- 
ing at Ruhr University Bochum, funded by the Deutsche Forschungsgemeinschaft 
(DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 
2092 CASA - 390781972. The work of Markus Krausz and Jan Philipp Thoma was 
funded by the German Federal Ministry of Education and Research (BMBF) under 
the project “QuantumRISC” (ID 16KIS1038) [26] and project “PQC4MED” (ID 
16KIS1044). 


References 


1. Albrecht, M., et al.: Classic McEliece (2017). https://classic.mceliece.org/ 

2. Aragon, N., et al.: BIKE-bit flipping key encapsulation (2017). https: //bikesuite. 
org/ 

3. Bachmeyer, J., et al.: RISC-V Bit-Manipulation ISA-extensions. https://github. 
com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0.pdf 

4. Becker, L.A.: VexRiscv-Profiler: a measurement tool for the vexriscv. https:// 
github.com/neunzehnhundert97 /VexRiscv- Profiler 

5. Bernstein, D.J.: Cache-timing attacks on AES (2005) 

6. Bernstein, D.J.: Batch binary edwards. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, 
vol. 5677, pp. 317-336. Springer, Heidelberg (2009). https: //doi.org/10.1007/978- 
3-642-03356-8_19 

7. Bodrato, M.: Towards optimal Toom-cook multiplication for univariate and mul- 
tivariate polynomials in characteristic 2 and 0. In: Carlet, C., Sunar, B. (eds.) 
WAIFI 2007. LNCS, vol. 4547, pp. 116-133. Springer, Heidelberg (2007). https:// 
doi.org/10.1007/978-3-540-73074-3_10 

8. Brent, R., Gaudry, P., Thomé, E., Zimmermann, P.: gf2x-1.3.0 (2021). https:// 
gitlab.inria.fr/gf2x/gf2x 

9. Brent, R.P., Gaudry, P., Thomé, E., Zimmermann, P.: Faster multiplication in 
GF(2)[x]. In: van der Poorten, A.J., Stein, A. (eds.) ANTS 2008. LNCS, vol. 
5011, pp. 153-166. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3- 
540-79456-1_10 

10. Cantor, D.G., Kaltofen, E.: On fast multiplication of polynomials over arbitrary 
algebras. Acta Inform. 28(7), 693-701 (1991) 

11. Chen, M.S., Chou, T.: Classic McEliece on the ARM Cortex-M4. IACR Trans. 
Cryptogr. Hardw. Embed. Syst. 2021(3), 125-148 (2021). https://doi.org/10. 
46586 /tches.v2021.13.125-148, https://tches.iacr.org/index.php/TCHES/article/ 
view /8970 

12. Chen, M.S., Chou, T., Krausz, M.: Optimizing BIKE for the intel Haswell 
and ARM Cortex-M4. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021(3), 
97-124 (2021). https: //doi.org/10.46586/tches.v2021.i3.97-124, https: //tches.iacr. 
org/index.php/TCHES/article/view/8969 

13. Cook, S.A., Aanderaa, S.O.: On the minimum computation time of functions. 
Trans. Am. Math. Soc. 142, 291-314 (1969) 


852 


14. 


15. 


16. 
17. 


18. 


19. 
20. 
21. 
22. 
23. 
24. 


25. 


26. 


2T: 


28. 


29. 


30. 


M.-S. Chen et al. 


Fritzmann, T., Sharif, U., Müller-Gritschneder, D., Reinbrecht, C., Schlichtmann, 
U., Sepulveda, J.: Towards reliable and secure post-quantum co-processors based 
on RISC-V. In: 2019 Design, Automation Test in Europe Conference Exhibition 
(DATE), pp. 1148-1153 (2019). https://doi.org/10.23919/DATE.2019.8715173 
Gueron, S., Kounavis, M.: Carry-less multiplication and its usage for computing 
the GCM mode. white paper, Intel Corporation (2008) 

Jao, D., et al.: SIKE (2017). https://sike.org/ 

Kannwischer, M.J., Rijneveld, J., Schwabe, P., Stoffelen, K.: PQM4: post-quantum 
crypto library for the ARM Cortex-M4. https://github.com/mupq/pqm4 
Kannwischer, M.J., Rijneveld, J., Schwabe, P., Stoffelen, K.: pqm4: testing and 
benchmarking NIST PQC on ARM Cortex-M4. IACR Cryptology ePrint Archive 
2019, 844 (2019). https://eprint.iacr.org/2019/844 

Karatsuba, A.: Multiplication of multidigit numbers on automata. In: Soviet 
Physics Doklady, vol. 7, pp. 595-596 (1963) 

Melchor, C.A., et al.: Hamming quasi-cyclic (HQC). NIST PQC Round 2, 4-13 
(2018) 

Montgomery, P.L.: Five, six, and seven-term Karatsuba-like formulae. IEEE Trans. 
Comput. 54(3), 362-369 (2005) 

Naehrig, M., et al.: FrodoKEM (2017). https://frodokem.org/ 

Papon, C.: Spinalhdl. https://github.com/SpinalHDL/SpinalHDL 

Papon, C.: Vexriscv-32 bit RISC-V processor. https://github.com/SpinalHDL/ 
VexRiscv 

Pircher, S., Geier, J., Zeh, A., Mueller-Gritschneder, D.: Exploring the RISC-V 
vector extension for the Classic McEliece post-quantum cryptosystem. In: 2021 
22nd International Symposium on Quality Electronic Design (ISQED), pp. 401- 
407. IEEE (2021) 

QuantumRISC: Quantumrisc-next generation cryptography for embedded systems 
(2020). https://www.quantumrisc.org/ 

Roy, D.B., Fritzmann, T., Sigl, G.: Efficient hardware/software co-design for post- 
quantum crypto algorithm SIKE on ARM and RISC-V based microcontrollers. In: 
Proceedings of the 39th International Conference on Computer-Aided Design, pp. 
1-9 (2020) 

Schénhage, A., Strassen, V.: Schnelle multiplikation grosser zahlen. Computing 
7(3), 281-292 (1971) 

Sim, B.Y., Kwon, J., Choi, K.Y., Cho, J., Park, A., Han, D.G.: Novel side-channel 
attacks on quasi-cyclic code-based cryptography. IACR Transactions on Crypto- 
graphic Hardware and Embedded Systems, pp. 180-212 (2019) 

Weimerskirch, A., Paar, C.: Generalizations of the Karatsuba algorithm for efficient 
implementations. [ACR Cryptol. ePrint Arch. 2006, 224 (2006) 


Amin Abdulrahman!?), Vincent Hwang?:*@), Matthias J. Kannwischer?®), 


ou 


Faster Kyber and Dilithium 
on the Cortex-M4 


and Daan Sprenkels>() 
1 Ruhr University Bochum, Bochum, Germany 
amin.abdulrahman@mpi-sp.org 
? Max Planck Institute for Security and Privacy, Bochum, Germany 
3 Academia Sinica, Taipei, Taiwan 
matthias@kannwischer.eu 
4 National Taiwan University, Taipei, Taiwan 
Digital Security Group, Radboud University, Nijmegen, The Netherlands 
daan@dsprenkels.com 


Abstract. This paper presents faster implementations of the lattice- 
based schemes Dilithium and Kyber on the Cortex-M4. Dilithium is one of 
three signature finalists in the NIST post-quantum project (NIST PQC), 
while Kyber is one of four key-encapsulation mechanism (KEM) finalists. 
Our optimizations affect the core polynomial arithmetic involving 
number-theoretic transforms in both schemes. Our main contributions 
are threefold: We present a faster signed Barrett reduction for Kyber, 
propose to switch to a smaller prime modulus for the polynomial multipli- 
cations cs; and cs2 in the signing procedure of Dilithium, and apply var- 
ious known optimizations to the polynomial arithmetic in both schemes. 
Using a smaller prime modulus is particularly interesting as it allows 
using the Fermat number transform resulting in especially fast code. 
We outperform the state-of-the-art for both Dilithium and Kyber. For 
Dilithium, our NTT and iNTT are faster by 5.2% and 5.7%. Switching to 
a smaller modulus results in speed-up of 33.1%-37.6% for the relevant 
operations (sum of the base multiplication and iNTT) in the signing 
procedure. For Kyber, the optimizations results in 15.9%-17.8% faster 
matrix-vector product which is a core arithmetic operation in Kyber. 


Keywords: Dilithium - Kyber - NIST PQC - Fermat Number 
Transform + Number-Theoretic Transform - Arm Cortex-M4 


1 Introduction 


Lattice-based cryptography appears to be the most promising family of post- 
quantum replacements needed for public-key cryptography broken by Shor’s 
algorithm [Sho94]. As lattice-based key encapsulation schemes and digital signa- 
tures provide reasonable key, ciphertext, and signature sizes and have particu- 
larly good performance on a variety of platforms, they are expected to be stan- 
dardized soon. One of such standardization efforts is the NIST PQC [Nat] project 
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aiming to find replacements for NIST’s standards for key establishment and dig- 
ital signatures as early as 2024. NIST PQC is nearing the end of its third round 
with announcements due in early 2022. Among the third round finalists in the 
competitions are 5 lattice-based schemes including the three key-encapsulation 
mechanisms (KEMs) Kyber, NTRU, and Saber as well as the digital signature 
schemes Dilithium, and Falcon. As there are only two other finalists (Classic 
McEliece and Rainbow) that are not lattice-based, which both have excessively 
large keys, it appears very likely that some of the lattice-based schemes are going 
to be selected for standardization unless there are cryptanalytic breakthroughs. 

Lattice-based cryptography is particularly suitable for microcontrollers as 
the key material is still of manageable size and computational performance is 
particularly fast with encapsulation and decapsulation in a few milliseconds while 
signing and verification times in the tens to hundreds of milliseconds. NIST 
has designated the Arm Cortex-M4 as the primary microcontroller optimization 
target for NIST PQC, and, hence, it has received the most attention so far. 

It appears that the number-theoretic transforms are cores of all high-speed 
implementations of lattice-based crypto for the Cortex-M4. It is either pre- 
scribed in the specification of Dilithium, Falcon, and Kyber, or maintains to be 
the fastest polynomial multiplication methods in Saber, NTRU [CHK+21], and 
NTRU Prime [ACC+ 20]. 

In this work, we focus on Kyber and Dilithium on the Cortex-M4. They are 
both part of the “Cryptographic Suite for Algebraic Lattices (CRYSTALS)” 
and are both designed to benefit from the NTT. We show that even though 
implementations have been improving for many years, we can still significantly 
improve the involved arithmetic. 


Contributions. The contribution of this work is threefold. Firstly, we apply 
various known techniques from work on the Cortex-M4 optimizing Saber, NTRU, 
and NTRU Prime. While the techniques are already known, they have so far not 
been applied to Kyber and Dilithium. This includes (1) the use of Cooley—Tukey 
butterflies for the inverse NTT of both Kyber and Dilithium previously proposed 
for Saber in [ACC+21]; (2) the use of floating point registers for caching values in 
the NTT of Dilithium and Kyber which was first proposed in the context of NTTs 
for NTRU Prime in [ACC+20]. This allows to merge more layers of the NTT 
and reduce memory access time for loading twiddle factors; (3) we make use of 
the “asymmetric multiplication” proposed in [BHK+21] which eliminates some 
duplicate computation in the base multiplication of Kyber at the cost of extra 
stack usage; and (4) we use an idea from [CHK+21] to improve the accumulation 
in the matrix-vector product of Kyber by using a 32-bit accumulator allowing to 
eliminate some modular reductions at the cost of more stack usage. 

Secondly, we present a faster Cortex-M4 instruction sequence to implement 
a signed Barrett reduction on packed 16-bit values applicable to the Kyber NTT. 
This immediately improves the Barrett reduction code proposed in [BKS19] from 
8 cycles to 6 cycles per packed reduction. 

Thirdly, we propose to use a different implementation for computing the 
product cs; as well as cs2 in Dilithium. Since both c and s;/s2 have very small 
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absolute values, we can switch to a much smaller modulus q’ that allows effi- 
cient computation of the product. For Dilithium2 and Dilithium5, we make use 
of the Fermat prime q’ = 257, which allows using a particularly fast variant of 
the NTT called the Fermat number transform (FNT), similar to [LMPR08] for 
SWIFFT. Furthermore, [LMPR08] implements FNT on an Intel processor while 
we implement FNT on the Cortex-M4 and make use of its barrel shifter. For 
Dilithium3 the FNT does not work as sı and s2 have larger values. We instead 
use an incomplete NTT with gq’ = 769 which is still much faster than computing 
it modulo the original Dilithium prime. To best of our knowledge, we are the first 
to propose using a smaller modulus for these multiplications within Dilithium. 


Code. Our code is open-source and available at https://github.com/Faster 
KyberDilithiumM4/FasterKyberDilithiumM4. We will publish the code along- 
side the paper under a CCO copyright waiver. 


Structure. Section 2 recalls the preliminaries regarding Kyber, Dilithium, and 
the Cortex-M4. In Sect.3 and Sect. 4, we describe the optimizations applied to 
Kyber and Dilithium, respectively. Lastly, in Sect. 5, we present the performance 
results and compare them to previous work. 


2 Preliminaries 


This section introduces the cryptographic schemes Kyber and Dilithium, which 
are both part of the Cryptographic Suite for Algebraic Lattices (CRYSTALS). 
Furthermore we give a brief introduction into the polynomial multiplication using 
the NTT, revisit the Barrett reduction and present relevant details considering 
our target platform, the Arm Cortex-M4. 


2.1 Notation 


For a prime q and a power of two n, we denote the polynomial ring Z,[X]/(X"+ 
1) by Rg. An element a € R, is represented by a coefficient vector a; € Zq, such 
that a = 57") a;X*. We denote polynomials using lower-case letters (e.g., a), 
vectors of polynomials using lower-case boldfaced letters (e.g., a), and matrices 
of polynomials using upper-case boldfaced letters (e.g., A). We symbolize poly- 
nomials, vectors, and matrices inside NTT-domain using â, a, and A, respectively. 

Following the definitions from [BDK+20,ABD+20], for an odd q we define 
the result of the central reduction r’ = r mod *q as the unique element in 
[-, a] satisfying r’ = r mod q. Similarly, we define the result of r’ = r mod 
+q as the unique element in [0, q) satisfying r’ = r mod q. For scenarios in which 
the range of the reduction result does not matter, we write r’ = r mod q. 

The function sampleUniform(-) samples coefficients for polynomials, vectors 
of polynomials, or matrices of polynomials from a uniformly random distribution. 
In case a seed is given as the argument, the output is pseudorandomly generated 
from the seed. 
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2.2 Polynomial Multiplications Using the NTT 


The NTT is a variant of the discrete Fourier transform (DFT) defined over finite 
fields and is commonly used for efficient polynomial multiplications. The effi- 
ciency of this strategy is based on the fact that a polynomial multiplication 
inside NTT domain amounts to the coefficient-wise multiplication of the two poly- 
nomials. Specifically, the negacyclic NTT is used for multiplying polynomials in 
Zq|X|/(X" + 1). 

Computing the negacyclic NTT can be viewed as the evaluation of a polyno- 
mial at powers of a primitive n-th root of unity Çn for the polynomial ring Ry 
with q prime. Additionally, multiplying all coefficients a; of a E€ Rq by powers 
of a 2n-th root of unity Go, = VCn is called “twisting” [Ber01]. 

This comes down to computing 


n-1 n—-1 
NTT(a) = â = 5 âX’ with a4; = X` ajh Gi 
i=0 j=0 


for the forward transform (NTT) and 


n—1 n—1 
iNTT(â) = a = 5 aX’ with a; = n~'6;,' 5 âj” 
i=0 j=0 


for the inverse transform (iNTT) [AB74]. The powers of the roots of unity used 
during the computation of the NTT are also frequently called “twiddle factors”. 

For computing the NTT itself efficiently, fast Fourier transform (FFT) algo- 
rithms, which only require O(n log n) operations, are commonly used. This algo- 
rithm was first described by Gauss in 1805 [Gau66] but it is also oftentimes cred- 
ited to Cooley and Tukey who published the same algorithm in 1965 [CT65]. The 
basic idea of the algorithm is to split the computation of a length n NTT into, most 
commonly, two separate number-theoretic transforms (NTTs) with an input size 
of n/2 each. Formally, we compute the isomorphism Ry > []; Zq[X]/(X — Gn) 
for i = 1,3,5,...,n — 1 as given by the Chinese Remainder Theorem (CRT), 
as also explained in [BDK+20, Section 2.2]. For example, in the first instance 
we map Zq[X]/(X" + 1) to Z[X]/(X"/? — cR/?) x Zq[X]/(X"/? + Gv”). This 


n 
splitting is usually repeated for log, n iterations, called “NTT layers”, where the 


results of the i-th layer are the remainders of polynomials a mod (X% ~’ + Ci.) 
for some j. Computing these remainders involves n/2 so-called butterfly opera- 
tions per layer. The CooleyTukey (CT) butterfly, consisting of one addition, one 
subtraction, and one multiplication in Z,, is depicted in Fig. la. 

While the CT algorithm is frequently used for computing the NTT, the 
Gentleman-—Sande FFT algorithm is commonly deployed for computing the iNTT. 
In contrast to this, we use the CT algorithm for the computation of the NTT and 
its inverse. A depiction of the GS butterfly is Fig. 1b. 

Using this method, the product of f,g € Rq can be efficiently computed as 
iNTT(NTT(f) o NTT(g)), where o indicates the base multiplication of two polyno- 
mials. In case the NTT is computed on log n layers, base multiplication is equal to 
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a > + > a+ cb a t >a+b 
b—>(x oth h Q eat) 
¢ ¢ 
(a) Cooley—Tukey butterfly (b) Gentleman-Sande butterfly 


Fig. 1. NTT butterfly operations 


coefficient-wise multiplication requiring only n multiplications. In case the NTT is 
computed on | < logn layers, yielding 2! polynomials mod z™—w for m = sr and 
w a power of a root of unity, it is called an “incomplete” NTT. For this scenario, 
the base multiplication corresponds to pairwise m x m schoolbook multiplica- 
tions. This idea was initially introduced in [LS19] for the case of the modulus not 
supporting an NTT on logn layers, but is also applied for performance reasons in 
several other implementations, for example, [ABCG20, CHK+21, ACC+21]. 


2.3 Fermat Number Transform 


The Fermat number transform (FNT) is a special case of NTT in that the 
modulus is a Fermat number F, := 2%” + 1. It was introduced in [SS71] for 
large integer multiplications and in [AB74,AB75] for digital convolutions. In 
this paper, we implement FNT for negacyclic convolution. For arbitrary F; as 
the modulus, cyclic transformations of sizes dividing 2*+? are supported [AB74, 
AB75]. For computing a negacyclic transformation of size n = 2¢*1 and Gan = 
V2, the first split becomes 


Zr[X]/(X"- 2) S Zr [X] (XË - 2°") x Zp [X]/(X? +2’) 
=Zp,[X]/(X2 — 2°") x Zp,[X]/(XF -a e, 


After applying t layers, all of the polynomial rings are of the form Zp, [a] /(X 2t — 
21) where j is an odd number. Since ¢3,, = 2, we can apply one more split. 
Furthermore, if F; is a prime, then we can compute cyclic transformations of 
sizes up to 2” = F, — 1 and negacyclic transformations up to 22°~!. Since the 
twiddles in initial t layers are powers of two, we can multiply with the twiddles 
using shift operations which is much cheaper than explicit multiplications on 
many platforms. Note that the only known prime Fermat numbers are Fo = 3, 
FP, = 5, Py = 17, F; = 257, F4 = 65537. Out of those, only F3 and F4 appear 
promising for the use in Dilithium. They allow to compute 3 or 4 layers using 
only shifts. 
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2.4 Kyber 


Kyber [ABD+20] is an IND-CCA2-secure lattice-based key-encapsulation mech- 
anism(KEM) constructed from an IND-CPA secure public-key encryption 
scheme Kyber.CPAPKE using a variant of the Fujisaki-Okamoto (FO) trans- 
form [FO99]. The security of the scheme is based on the hardness of the 
module-learning with errors (MLWE) problem, a trade-off between the ring- 
learning with errors(RLWE) problem and learning with errors (LWE) prob- 
lem [ABD+20, Section 1.5]. Kyber is one of four round-three KEM-finalists in 
the NIST PQC [Nat] next to Saber [DKRV20], NTRU [CDH+20], and Classic 
McEliece [ABC+20]. 


Parameters. Kyber uses q = 3329 as its prime and n is chosen to be 256. 
Thus, it operates on Rg = Zg329[X]/(X?°° + 1) [ABD+20, Section 1.4]. The 
specification defines three different security levels of Kyber, namely Kyber-512 
(k = 2,m, = 3), Kyber-768 (k = 3,m, = 2), and Kyber-1024 (k = 4,m, = 
2) [ABD+20, Section 1.4]. Due to the fact that q and n remain constant across 
the three parameter sets, almost all possible optimizations apply to all variants. 
For the specification of Kyber, we refer to [ABD+20] and omit the description. 


Number Theoretic Transform. Since polynomial multiplication is among 
the most costly operations for Kyber, the polynomial ring has been chosen, such 
that Kyber can profit from efficient polynomial multiplication using the NTT. 
For q = 3329, as deployed in Kyber, no primitive 512-th but only primitive 
256-th roots of unity exist for Rq with the first one being Çn = 17 [ABD+20]. 
This means that the defining polynomial of R, (X7°° + 1) factors into 128 
polynomials of degree one and not into 256 polynomials of degree zero. Therefore, 
the result of the NTT of f € Rq is a vector of 128 polynomials of degree one. 
Thus, in contrast to Sect. 2.2, the coefficients â; inside NTT domain are given by 


127 127 

r 2brz(i)+1)j z 2brz(i)+1)j 

âz: = X ag ODI, and dain = X azzy OTDI 
J=0 j=0 


as defined in [ABD+20]. The function bry computes the bit reversal of a 7-bit 
integer on its argument. 

The absence of a primitive 512-th root of unity also has an impact on the 
base multiplication of two polynomials inside NTT domain: Instead of coefficient- 
wise multiplication, we need to perform schoolbook multiplications of size 2 x 2, 


i.e., we need to compute 128 products mod (X? — G2""7*1) [ABD-+20]. 


2.5 Dilithium 


Dilithium [DKL+18, BDK+20] is a lattice-based digital signature scheme based 
on the “Fiat-Shamir with Aborts” approach [Lyu09]. Its security is based on the 
hardness of the modular short integer solution (MSIS) and MLWE problems and 
it is currently among the three signature-finalists in the NIST PQC project [Nat], 
next to Falcon [FHK+20] and Rainbow [CDK+20]. 
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Parameters. Dilithium deploys the prime q = 8380417 = 27° — 213 +1 and 
operates on the polynomial ring Rg = Z,[|X|/(X" + 1) with n = 256. The two 
parameters q and n are the same across all parameter sets. 

Dilithium offers three different parameter sets, namely Dilithium2, Dilithium3, 
and Dilithium5, which target the three NIST security levels 2, 3, and 5. More 
details on the differences between the three parameter sets can be obtained from 
Table 1. The matrix dimension is given by (k,l), the bounds for sampling the 
secret key by 7, the number of +1 in the challenge polynomial c is T, and #reps 
refers to the expected number of repetitions during the rejection sampling in the 
signature generation process [BDK+20]. The parameters yı and y2 define the 
range for the coefficient y and the low-order rounding range [BDK+20]. 


Table 1. Overview of Dilithium’s parameter sets [BDK+20] 


Scheme NIST level (k,l) n) T % y2 #reps |pk| | sig | 
Dilithium2 2 (4,4) 2 39 21” (q—1)/88 4.25 1312B 2420B 
Dilithium3 3 (6,5) 4 49 219 (q—1)/32 5.1 1952B 3293B 
Dilithium5 5 (8,7) 2 60 219 (q—1)/32 3.85 2592B 4595B 


We refer to [BDK+20] for the specification of Dilithium and omit the descrip- 
tion. 


Number Theoretic Transform. Since the main algebraic operations used 
by Dilithium are polynomial multiplications, Dilithium’s ring was chosen in such 
a way that the NTT can be applied [BDK+20]. In contrast to Kyber, for the 
Dilithium ring, a 2n-th primitive root of unity r = 1753 exists [BDK+20] and 
thus it is possible to compute a complete NTT with eight layers as described in 
Sect. 2.2. This allows for base multiplication by coefficient-wise multiplication. 


2.6 Barrett Reduction 


The Barrett reduction [Bar87] is an efficient algorithm for reductions in Zg. 
Besides its performance, one advantage is that it can be easily implemented in 
constant-time. A variant of the Barrett reduction that operates on signed integers 
has been presented in [Seil8, Algorithm 5] which has also been deployed in a 
previous implementation of Kyber [ABCG20]. Algorithm 2.1 is an illustration. 


Algorithm 2.1: Signed Barrett Reduction [ABCG20] 


Input :q with 0<q< 4,2}q anda with -£ <a < £ 
Output: r with r=a(modq),0<r<q 


luce ee > precomputed 
2t Laag] > signed high product and arithmetic right shift 
3 t — tq mod 8 > signed low product 


4 returnr<a-t 


860 A. Abdulrahman et al. 


2.7 Arm Cortex-M4 


The target platform for our implementation is the Arm Cortex-M4(F), which 
is a NIST-recommended evaluation platform for the candidates of the NIST 
PQC project. The Arm Cortex-M4 is based on the Armv7E-M instruction set 
architecture with 14 usable 32-bit general purpose registers. Additionally, on 
the Cortex-M4F, there are 32 single-precision floating-point registers [ARM11]. 

The instruction set also provides a number of powerful digital signal pro- 
cessing (DSP) instructions which allow to perform arithmetic operation on 
two half words or four bytes at the same time and have proven themselves 
to be beneficial in numerous implementations [BKS19, ABCG20, KMSRV 18] 
of Kyber [ABD+20], and Saber [DKRV20]. In particular, the instructions 
smul{b,t}{b,t} multiply specific halfwords and smla{b,t}{b,t} multiply spe- 
cific halfwords and accumulate the product to the specified accumulator. Addi- 
tionally, the instructions smuad{,x} perform two halfword-multiplications and 
add up their products, while smlad{,x} perform two halfword-multiplications 
and add up their products which is then added to an accumulator. All of these 
instructions take one cycle to execute. Moreover, the Cortex-M4 can compute 
the 64-bit product of two 32-bit values (optionally, with accumulation) in a sin- 
gle cycle. Furthermore, the Cortex-M4 provides a barrel shifter for shifting or 
rotating the second operand for certain instructions with no additional cost. 

On the Cortex-M4, store instructions always take a single cycle, while a 
sequence of independent loads takes n + 1 cycles. Using the vldm instruction, it 
is possible to directly load data from the memory into the floating point registers. 
This also consumes n + 1 cycles for n data words. 


3 Improvements to Kyber Implementations 


For Kyber, we propose several optimizations for implementing NTT and iNTT and 
some speed optimizations to the matrix-vector product at the cost of a higher 
stack usage. We provide one implementation with all optimizations and one with 
only the optimizations that do not impact the stack usage. 

We base our implementations on [ABCG20] and the implementation in the 
pqm4 [KRSS19] project. In the following we focus on our contributions and omit 
details of the numerous optimizations present in previous implementations. 


3.1 NTT 


Caching in FPU Registers. For Kyber, on the layers 7—4, 15 twiddle factors 
are required and re-used multiple times throughout the iterations. By using the 
floating-point registers for caching the twiddle factors, the number of cycles for 
memory loads are reduced. This technique has been proven to be beneficial in 
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past work [ACC+20, CHK+21, ACC+21]. In our implementations, we load the 
15 twiddle factors (packed into eight registers) into the floating-point registers 
once with vldm instruction in nine cycles. Then, in each iteration the twiddle 
factors are fetched from the floating-point registers with vmov in a single cycle 
each. 

On the three remaining layers, it is not beneficial to make use of the floating 
point registers because in each of the 16 iterations at least one unique twiddle 
factor per layer is required, meaning none of the twiddle factors are re-used. 


Better Layer Merging. In our implementations we make use of the common 
optimization strategy of merging layers of the NTT computation [GOPS13]. The 
idea behind this strategy is to load multiple coefficients at once such that more 
than one layer of NTT can be computed at a time. This reduces the number of 
memory operations required at the cost of taking up more registers. The state- 
of-the-art implementation of Kyber [ABCG20] also deploys this strategy merging 
layers 7-5 and 4—2 while computing layer 1 separately. 

By making use of the floating point registers, we instead implement the NTT 
by merging layers 7-4 and 3-1. Layers 7—4 can be merged by first computing 
three layers of NTT on each (a1, a3, 5, @7,@9, @11, @13, @15) and (ao, G2, a4, a6, As, 
a10, 412, @14) and then combining their results. First, the NTT on (a1, a@3,...,@15) 
is computed and each of the layer 5 outputs is multiplied by the correspond- 
ing twiddle factors of the fourth layer. Then, (a1,a@3,...,@15) are moved to the 
floating point registers for later use. After that, the polynomials (ao, a2,..., @14) 
are loaded and the NTT is computed on them. Finally, we vmov (ao, @2,...,@14) 
one at a time and compute the final add-sub. In summary, this requires 128 
additional vmovs, whereas a separate layer requires 128 loads and 128 stores. 


3.2 Inverse NTT 


The most significant change we apply to the inverse NTT is the switch from 
Gentleman-Sande butterflies to Cooley-Tukey butterflies. Therefore, all of the 
optimizations mentioned in the context of the NTT also apply to the inverse NTT. 


Switch to CT-Butterflies. In previous implementations of Kyber for the 
Arm Cortex-M4, the NTT was always implemented using CT butterflies, while 
the inverse NTT was implemented using GS butterflies, which is a commonly 
seen pattern for implementations using the NTT in general. Opposed to that, we 
implement the inverse NTT using CT butterflies in order to avoid the necessity 
of intermediate modular reductions by limiting the coefficients’ growths, as for 
example suggested in [Seil8, Section 2.1] or implemented for Saber in [ACC+21]. 
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Algorithm 3.1: Packed Barrett Algorithm 3.2: Improved Packed 


Reduction [BKS19] Barrett Reduction 
Input : a= (a || ap) Input : a= (a || ab) H 
Output: c= (cs || cb) mod *q Output: c= (c: || cp) mod *q 


32 


smlawb to, =|" | anor? 
smlabt to,q,to,a 


smlawt tı, —|2],a,21° 
smulbt t1,q, ti 
add ti1,a,t1,1sl #16 


pkhbt c,to,t1,ls1 #16 


m 


smulbb to, a, |2] 
226 
smultb ti, a, Lael 
asr to, to, #26 
asr ti,ti, #26 
smulbb to, to, q 
smulbb ti, t1,q 
pkhbt to,to,ti,1sl #16 
usub16 r,a, to 
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Using CT butterflies for the inverse NTT requires to do additional twisting 
during the computation of the last layer but the total number of multiplications 
does generally not increase because multiplications in the same amount can be 
omitted during the butterfly operations (“light butterflies”). One side effect of 
this approach is that some coefficients will grow larger than in the forward NTT 
because the multiplications in the butterflies always include reductions and now 
the operands of the addition and subtraction in the butterfly are not always 
limited by this. To counteract, we insert two modular multiplications on the 
fourth layer to limit the growth of the coefficients to be in (—9q, 9q), at most 
after the fourth layer. By detailed range analysis, we found that on the last three 
layers we need 20 additional reductions on packed arguments in total. 
Moreover, the Montgomery multiplication during the twisting removes the 
need of a separate Barrett reduction of every coefficient at the end of the last 
layer. This saves 256 Barrett reductions. 

Note that due to the new structure of the iNTT the input coefficients’ absolute 
values need to be smaller than q. 


3.3 Faster Barrett Reduction 


Similar to previous implementations, we deploy the Barrett reduction to reduce 
the coefficients. The Barrett reduction of two 16-bit integers packed in one 32-bit 
register has been previously implemented [BKS19] as shown in Algorithm 3.1. 
Using the smlaw{b,t} instructions as in Algorithm 3.2, the cycle count of one 
Barrett reduction is reduced by one. This means for reducing a packed argument, 
two cycles are saved. In contrast to the implementation from Algorithm 3.1, the 
technique presented in Algorithm 3.2 requires two Barrett constants which are 
both different from the previous one. Moreover, using this optimization removes 
the guarantee of the reduction’s result being in [0,q), instead it will result in 
[-, i) for an odd q. Therefore, its output must not be passed to one of the 
packing or compression functions because they assume the input to be in [0, q). 
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This means, it may not be used in the poly_reduce function but it can be used 
inside the NTT and iNTT. 


3.4 Matrix-Vector Product 


For speed optimization of the matrix-vector product, we implement two tech- 
niques. Both of them require additional stack space and therefore, if a low mem- 
ory footprint is a concern, the applicability needs to be checked. Further, we 
re-implement the C function for the computation of the matrix-vector product 
in assembly which allows us to significantly lower the number of function calls 
required by efficiently using the registers and making use of macros. We proceed 
similarly for the inner product in the decryption. 


Asymmetric Multiplication. For the computation of the matrix-vector prod- 
uct As in Kyber, we compute iNTT(A oNTT(s)). During this computation, every 
row of A needs to be multiplied by 8. Therefore it is a common strategy to cache 
the result of 8 instead of recomputing it for every row of A [BKS19]. Using a 
trick for integer multiplication presented in [BDL+11,BHK+21] extended the 
aforementioned concept for which incomplete NTTs are deployed. 

Recall that the Kyber NTT is incomplete, i.e., 7 instead of 8 layers are 
computed, and therefore the product of two polynomials inside NTT-domain 
aos = ĉ consists of 128 2 x 2 schoolbook multiplications. For computing 
Cai + ĉ2i+1X = (Gai + Goi41X ) ($25 + Soi41X) mod (X? — CLr), we have 
Cog = Gog Sai + û2i41ŝ2i 1C +? and ĉzip1 = GoiSaig1 + ŝ2iû2i+1- 

The idea behind the proposal from [BHK+21, Section 4.2] is that during the 
computation of A o 8, each polynomial of s is used k times which means that 
the computation of 82;+1¢7''7+! is repeated k times. This can be avoided by 
caching the intermediate results of 82;;1¢26'7*+! in a separate vector S’. 

We implement two separate variants for the base multiplication, one of which 
is only used for the first row of the matrix in the matrix-vector product, while the 
other one is used for all of the following ones. The first variant computes the same 
base multiplication as before except that it stores the result of §2;41¢ 2brz(i)+1 sep- 
arately. This comes at the cost of two additional stores and one additional load 
from the stack for the argument containing the address of S’ per two polynomial 
multiplications. The second variant saves two smultb instructions, two mont- 
gomery reductions, and the load of one twiddle factor per two polynomials by 
loading the cached values instead. The precomputed vector can also be re-used 
in the inner product following the matrix-vector multiplication in encryption. 


Better Accumulation. We also make use of an improved accumulation strat- 
egy in the matrix-vector product as presented in [CHK+21]. For the computation 
of one element of the output vector in a matrix-vector product, a total number 
of k base multiplications as well as k — 1 accumulating additions are required. 
Instead of reducing each coefficient directly after the base multiplication before 
accumulating, we delay this step until all three base multiplication results have 
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been accumulated. We also implement this technique for the computation of the 
inner product. For the implementation, we define three variants of the caching 
and non-caching base multiplication functions each: One that takes 16-bit input 
values and writes to a 32-bit output array, one that takes unreduced 32-bit input 
values and writes to a 32-bit output array, as well as one function that also takes 
unreduced 32-bit input values but outputs reduced and packed coefficients in 
a 16-bit integer array. For the second type of the function, the operation on 
32-bit values also allows for usage of smla{b,t} instead of smul{b,t} such that 
no extra addition is required for the accumulation, compared to the case when 
computing on packed 16-bit coefficients. 

Due to the small size of the Kyber prime, the sum will never overflow a signed 
32-bit integer: For the matrix-vector products in Kyber using asymmetric multi- 
plication, possible vector-inputs are the output of an NTT which is in [— mH, a) 
or the cached Montgomery multiplication result from the asymmetric multipli- 
cation which is in (—q,q). The coefficients of the matrix generated using the 
on-the-fly approach from [BKS19] are smaller than q. Therefore, the maximum 
result for one of the multiplications is € (—q?,q?). For k accumulations with 
k € {2,3,4}, we get a maximum absolute intermediate value of kq? = 4q? < 21. 


4 Improvements to Dilithium Implementations 


For Dilithium we deploy similar strategies for optimizing the NTT and iNTT as for 
Kyber and optimize the multiplication of c and sj, as well as c and s2. 


4.1 NTT and Inverse NTT 


For the NTT, we merge the layers as 7-5, 4-2, 1-0 to reduce the number of mem- 
ory operations. This differs from the previous implementation [GKS20, GKOS18] 
where layers 7—6, 5—4, 3-2, and 1-0 are merged. For the iNTT, we similarly switch 
to CT-butterflies and merge as in the NTT. 


Switch to CT-Butterflies. Just as for Kyber, we switch to CT butterflies for 
the computation of the iNTT. Further, we make use of a technique introduced 
in [ACC+21, Appendix D] which computes light butterflies with one less reduc- 
tion. As opposed to the Kyber, the coefficients’ growth due to the light butterflies 
is not of concern for the Dilithium since values up to 256q fit in a 32-bit register. 


4.2 Small NTTs for Dilithium 


In the signature generation of Dilithium, we recall that the polynomial c consists 
of T +1’s and 256 — 7 0’s, and all polynomials in sı and s2 consist of elements 
in [-n,n]. The absolute values of the coefficients in cs; and csz are bounded 
by Tn, and the computation can be regarded as in Zy for q! > 27m [CHK+21, 
Section 2.4.6]. As far as we know, all implementations choose q’ = 8380417 and 
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employ the NTT defined for Dilithium. However, since only the correct cs; and 
cS are required, there is some freedom for choosing q’. The parameters T - 7 
are 39-2 = 78 for, 49-4 = 196 for Dilithium3, and 60-2 = 120 for Dilithium5. 
Consequently, we choose the Fermat number q’ = F; = 257 for Dilithium2 and 
Dilithium5, and q’ = 769 for Dilithium3. Alternatively, one can also re-use the 
Kyber prime q’ = 3329 for any of the parameters in case re-using the code is of 
interest. We have also experimented with the Fermat number q’ = F4 = 65537 for 
Dilithium3. However, this did not result in in a speed-up compared to q’ = 769. 


FNT for Dilithium2 and Dilithium5. For q = 257 = 28 +1, we have FNT 
defined over Z257[X]/(X7°°+1). We implement the forward transformation with 
7 layers of CT butterflies. Since the input coefficients for c, sı, and sg are at 
most in [—7,7], we only need very few reductions. Recall that a CT butterfly 
maps (a,b) to (a + wb,a — wb), we can implement it with mla and mls. Fur- 
thermore, we can also take a closer look at the initial layers. Since —1 = 28 
(mod 257), the first layer can be written as Z257[X]/(X?7°° + 1) ~ Zo57[X]/ 
(X128 — 24) x Zos7[X]/(X1?8 + 2+) and the corresponding CT butterfly maps 
(a,b) to (a+2*b, a—2*b). We denote such computation as CT_FNT(a, b, 4). Notice 
that without loading twiddle factors, we can implement CT_FNT(a, b, logW) effi- 
ciently with barrel shifter as illustrated in Algorithm 4.1. 

Let iFNT be the inverse of FNT. We first observe that the inverse of 2% can be 
written as 27* = 216-* = —28-k (mod 28 + 1). There are two places where we 
need to multiply by an inverse of a power of two: (i) the inverses corresponded 
to the butterflies with w = 21°&" in CT_FNT, and (ii) the scaling by 1287! at 
the end of iFNT. We denote CT_iFNT(a, b, logW) as the function mapping (a,b) 
to (a — 22°8"b, a + 22°8"5) = (a + 28+1°8"b, q — 28+108Wb) and implement it with 
barrel shifter as shown in Algorithm 4.2. Clearly, if CT_FNT(a,b,k) computes 
(a+ 2*b, a — 2*b), then CT_iFNT(a,b,8 — k) computes (a + 27*b, a—27-*b) which 
can be used in iFNT. We compute iFNT with four layers of GS butterflies followed 
by three layers of CT butterflies. During the GS butterflies, since the twiddle 
factors are also very small, we can replace some of the mul, add, and sub with 
mla and mls. For CT butterflies, since the twiddle factors are powers of two, 
we implement them with Algorithm 4.2. Lastly, at the end of CT butterflies, we 
merge the twisting by powers of two with the multiplication by 12871. 


NTT over 769 for Dilithium3. For Dilithium3, since the maximum absolute 
value of cs; and csz2 is bounded by Ty = 4-49 = 196, we cannot use q' = 257 < 
2-196. We therefore choose q’ = 769 and modify the NTT and iNTT from Kyber. 
Except for discarding most of the Barrett reductions, the code is the same. 
Recall that for the NTT in Kyber, we require the output to be in =<, g] for 
the secret key. However, for Dilithium3, since we are only using 16-bit NTT for 
computing cs; and cs2, we can remove the Barrett reductions at the end and 


allow elements growing up to 7q’ in absolute value. 
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Algorithm 4.1: CT_FNT(a, b, logw). Algorithm 4.2: CT_iFNT(a, b, LogW). 


Input : (a,b) = (a,b) Input : (a,b) = (a,b) 
Output: (a,b) = Output: (a,b) = 
(a + Qreenh, a— 21°8"p) (a _ Qreenh, a + 21°89) 
1 add a, a, b, ls1 #logW 1 sub a, a, b, 1sl #logW 
2 sub b, a, b, 1lsl #(logW+1) 2 add b, a, b, 1sl #(logW+1) 


For the iNTT, replacing with q’ = 769 allows us to postpone the Barrett 
reductions by one layer and reduce the number of Barrett reductions by half. At 
the end of iNTT, we replace the 16-bit Montgomery multiplication with straight 
multiplication and 32-bit Barrett reduction. By using 32-bit Barrett reduction, 
the result is within [—384, 384] if the product is in [—113025697, 113025697]. 
Since log (448928697 ) = 18.17, we derive values in [—384, 384] by applying 32-bit 
Barrett reduction to the product of any signed 16-bit value and any constant 
from [—384, 384]. The downside for using 32-bit Barrett reduction is a slightly 
higher register pressure, but overall it is more favorable because we don’t need 
to reduce them again. This is different from the 16-bit NTT in [ACC+21]. They 
implemented the twist with Montgomery multiplication and then reduced the 
result to [—384, 384] with an additional 32-bit Barrett reduction. 


5 Results 


In this section, we present the implementations results of Kyber and Dilithium. 


5.1 Benchmarking Setup 


Our concrete hardware target is the STM32F4DISCOVERY with the STM32- 
F407VG MCU, which also is the target of previous publications concerning imple- 
mentations of post-quantum schemes on microcontrollers. It comes with 1 MiB 
of flash memory, and 192 KiB of RAM. 

Our benchmarking setup is based on pqm4 [KRSS19]. During the bench- 
marks, we clock the microcontroller at 24 MHz in order to avoid wait states 
during memory operations. We compile the code using arm-none-eabi-gcc ver- 
sion 10.2.1 with the -03 option. Regarding the Keccak implementation, we make 
use of the code provided in pqm4. For the randomness generation we rely on the 
microcontroller’s hardware random number generator (RNG). 

We compare our Kyber implementations to the code currently present in 
pqm4 which is based on the work in [ABCG20] and [BKS19]. Similarly, we com- 
pare our implementations of Dilithium (2 and 3) to the code in pqm4 which is 
based on [GKS20]. For Dilithium5, pqm4 does not currently have an implemen- 
tation due to a lack of stack space. We apply some of the stack optimizations of 
[GKS20] to our implementations, especially to make Dilithium5 work as well. It 
is important to note that the parameters of Kyber and Dilithium were changed 
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at the start of the third round of the NISTPQC competition. The numbers 
presented here reflect the round 3 versions contained in pqm4. Those are opti- 
mizations from the original papers ported to the third round parameters. The 
performance results for the full schemes do not match the original publications. 


5.2 Performance of NTT-Related Functions 


In Table 2, we present the cycle counts for the transformations we deploy in our 
implementations of Kyber and Dilithium. For the Kyber NTT, we achieve a speedup 
of 12.6%. Regarding the Kyber iNTT, we obtain a speedup of up-to 21.3%. Note 
that for the stack-optimized variant an additional reduction is required before 
the iNTT because of the absence of asymmetric multiplication. 

We achieve a speedup of 5.2% for the Dilithium NTT, and 5.7% for the iNTT. 
For the small NTTs the metric we are optimizing is (k + l) - NTT + #reps - 
(NTT + (k + l) - (basemul + iNTT)). As most of the small NTT are computed 
outside of the loop, we moved some of the reductions into the NTT resulting in 
a faster basemul. Note that for q = 257 and q = 769 the NIT and iNTT have 
very close performance, but the basemul differs. This results in the FNT being 
advantageous for Dilithium2 and Dilithium5. For (basemul + iNTT), we achieve a 
speedup of 37.6% for q = 257, and 33.1% for q = 769 compared to q = 8380417 
from [GKS20]. We also compare our q = 769 implementation to an existing one 
by [ACC+21], because theoretically, their 6-layer approach could also be used 
as well. Since the computation is dominated by (basemul + iNTT), we find that 
our 7-layer approach is faster. We also carefully examine the code by [ACC+21], 
and find that the last 32-bit Barrett reduction is performed outside the reported 
iNTT, so the speedup is more. 

Table 3 contains the result for our benchmarks of the MVP and inner product 
(IP) functions as deployed in Kyber. For the MVP, we consider the MVP as it 
is computed in the key generation. The MVP in the encryption is similar but 
contains k NTTs less. Note that in the actual implementation of Kyber, the 
MVP is interleaved with the on-the-fly generation of the matrix. For ease of 
comparison, we additionally provide benchmarks for a stripped down variant 
of the MVP excluding the hashing. Regarding our benchmarks, we count the 
caching for the asymmetric multiplication towards the MVP although the IP for 
the encryption also benefits of this pre-computation. For the same reasons as for 
the MVP, the benchmarks of our IP functions only include the NTTs, the base 
multiplications, and deserialization, if applicable. For the speed optimized MVP 
implementation, we get speedups between 15.9% and 17.8% (excl. hashing). The 
stack optimized variant, achieves speedups between 12.1% and 12.5%. We achieve 
speedups of 26.9%-31.7% (enc) and 21.6%-23.3% (dec) for the speed optimized 
inner product, while for the stack variant we obtain speedups of 4%-6.3% and 
17.3%-18.9%, respectively. We observe that for larger k, the speed optimization 
strategy gives increasingly lower cycle counts due to asym. multiplication. 
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Table 2. Cycle counts for transformation operations of Kyber and Dilithium. NTT 
and iNTT correspond to the schemes default transformations, i.e., q = 3329 for Kyber 
and q = 8380417 for Dilithium. The NTT with q = 257 is deployed for Dilithium2 and 
Dilithium5, and the NTT with q = 769 is used used for Dilithium3. 


Prime Implementation NTT iNTT basemul 

Kyber q = 3329 ABCG20] 6 852 6979 2317 
This work 5992 5491/6282* 1613” 

Dilithium q = 8380417 [GKS20] 8 540 8 923 1955 
This work 8 093 8415 1955 

q = 257 This work 5524 5563 1225 

q = 769 ACC+21] (6-layer) 4852 4817 2 966 

This work 5 200 5537 1740 


a First value is for speed-optimization, second for stack-optimization. 

> Asymmetric basemul as used in the IP (enc). As the basemul in the MVP 
and IP consists of individual function calls, the cycle count is not straight 
forward to measure. 


Table 3. Cycle counts for matrix-vector and inner products used in Kyber. 


implementation variant operation Kyber-512 Kyber-768 Kyber-1024 
pqm4 Matrix-Vector Product* 66 291 127634 209 517 
Matrix-Vector Product? 226 580 484 077 840 498 

Inner Product (enc) 11978 14696 17 429 

Inner Product (dec) 29 888 41910 53 792 

This work speed Matrix-Vector Product® 55 746 106 380 172 152 
Matrix-Vector Product? 211606 457 213 796 349 

Inner Product (enc) 8 762 10331 11898 

Inner Product (dec) 23 425 32 354 41 275 

stack Matrix-Vector Product® 58 028 112503 184 149 

Matrix-Vector Product? 214053 463 590 808 206 


Inner Product (enc) 11218 13 877 16 733 


Inner Product (dec) 24 722 34 167 43 619 
a Measurement excluding the hashing. 
b Measurement including the hashing. 


5.3 Performance of Schemes 


Per Table 4, we achieve speedups of 3.3%-4.2%, 3.1%-3.6%, and 5.1%-5.2% for 
the key generation, encapsulation, and decapsulation our speed optimized vari- 
ant. As to be expected due to the caching of intermediate values for speed 
optimizations, our speed implementation has a higher stack usage. Our stack 
implementations use essentially the same stack as previous work. 

Table 5 contains the results for Dilithium. We achieve consistent speedups for 
all parameter sets. The absolute savings due to our optimizations are clearly seen, 
particularly in signing. The speedup for signing ranges from 1.5% to 5.6%. In 
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Table 4. Cycle counts and stack usage for Kyber for the key generation, encapsulation, 


and decapsulation. Cycle counts are averaged over 100 executions. 


implementation variant Kyber-512 Kyber-768 Kyber-1024 
cc stack [B] cc stack [B] cc stack [B] 
pqm4, [ABCG20] K 458k 2220 745k 3100 1188k 3612 
E 553k 2308 899k 2780 3292 3 292 
D 513k 2324 839k 2804 1294k 3324 
This work speed K 443k 4272 718k 5312 1138k 6 336 
E 536k 5376 870k 6416 7432 7432 
D 487k 5384 796k 6432 1227k 7 448 
stack K 444k 2220 724k 2736 1149k 3256 
E 540k 2308 879k 2808 3328 3 328 
D 492k 2324 807k 2824 1246k 3352 


relative terms, the impact of our optimizations on the full Kyber and Dilithium 
seem relatively small compared to the speedups we gain for the polynomial 
arithmetic. This is due to dominance of the hashing operations as thoroughly 


analyzed in previous work [KRSS19]. 


Table 5. Cycle counts and stack usage for Dilithium. K, S, and V correspond to the key 
generation, signature generation, and signature verification. Cycle counts are averaged 


over 10000 executions. 


implementation variant Dilithium2 Dilithium3 Dilithium5 
ce stack [B] ce stack [B] ce stack [B] 
pqm4, [GKS20] K 1602k 38k 2835k 61k 4836k 98k 
S 4336k 49k 6721k 74k 9037k 115k 
V 1579k 36k 2700k 58k 4718k 93k 
This work speed K 1596k 8508 2827k 9540 4829k 11 696 
S 4093k 49k 6623k 69k 8803k 116k 
V 1572k 36k 2692k 58k 4707k 93k 
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Abstract. As the Internet of Things (IoT) rolls out today to devices 
whose lifetime may well exceed a decade, conservative threat models 
should consider adversaries with access to quantum computing power. 
The IETF-specified SUIT standard defines a security architecture for 
IoT software updates, standardizing metadata and cryptographic tools— 
digital signatures and hash functions—to guarantee the update legiti- 
macy. SUIT performance has been evaluated in the pre-quantum con- 
text, but not yet in a post-quantum context. Taking the open-source 
implementation of SUIT available in RIOT as a case study, we survey 
post-quantum considerations, and quantum-resistant digital signatures 
in particular, focusing on low-power, microcontroller-based IoT devices 
with stringent memory, CPU, and energy consumption constraints. We 
benchmark a range of pre- and post-quantum signature schemes on a 
range of IoT hardware including ARM Cortex-M, RISC-V, and Espressif 
(ESP32), which form the bulk of modern 32-bit microcontroller architec- 
tures. Interpreting our benchmarks in the context of SUIT, we estimate 
the real-world impact of transition from pre- to post-quantum signatures. 


Keywords: Post-quantum - Security - IoT - Microcontroller - 
Embedded Systems 


1 Introduction 


Decades of experience with the Internet and networked software has shown that 
you can’t secure what you can’t update. Meanwhile, recent technological and soci- 
etal trends have fuelled the massive deployment of cyberphysical systems; these 
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systems are increasingly pervasive, and we are increasingly dependent on their 
functionalities. A so-called Internet of Things (IoT) emerges, weaving together 
an extremely wide variety of machines (embedded software and hardware) which 
are required to cooperate via the network, at large scale. 

Unpatched devices—or worse, unpatchable devices—quickly become liabili- 
ties. Exploits weaponizing compromised IoT devices are demonstrated time and 
again, sometimes spectacularly as with botnets such as Mirai [7]. However, the 
cure can become a disease: software updates are themselves an attack vector. 
Legitimate software updates laced with malware can compromise the updated 
device [46]. Once IoT devices are deployed, up and running, it thus becomes cru- 
cial to understand how, and when, software embedded in IoT devices is updated; 
how software updates are secured; and what level of security is provided. 

In this paper, we study the impact of the pre- to post-quantum transition on 
IoT software updates, assuming that we want to maintain 128-bit conventional 
security (matching current internet security standards) while reaching NIST 
Level 1 post-quantum security. We aim to answer the following questions: 


— How do the practical costs of pre- and post-quantum security compare? 

— What is the footprint of post-quantum security, relative to typical low-power 
operating system footprints? 

— What are the potential alternatives for post-quantum signature schemes to 
secure IoT software updates, and which hash functions should be used? 


1.1 Low-Power IoT and Post-quantum Cryptography 


Low-Power IoT Characteristics. One prominent and highly challenging compo- 
nent of IoT deployments consists in integrating low-power, resource-constrained 
IoT devices into the distributed system. These devices are typically based on 
low-cost microcontrollers (e.g., ARM Cortex M, RISC-V, ESP), interconnected 
via low-power radio or wired communication. An estimated 250 billion micro- 
controllers are in use today around the globe [31]. Compared to microprocessor- 
based devices, microcontrollers aim for a different trade-off: They offer much 
smaller capacity in computing, networking, memory [17], in order to achieve 
radically lower energy consumption and a tiny price tag (j$1 unit price). It is 
not uncommon to have a total memory budget of 64 KB of RAM and 500 KB of 
ROM (flash) for the whole embedded system software—including drivers, crypto 
libraries, OS kernel, network stack and application logic. Nonetheless, the func- 
tionalities and services provided by constrained microcontroller-based devices 
are as crucial as those of less constrained elements in the cyberphysical system. 


Post-quantum Cryptography. Post-quantum cryptosystems are designed to run 
on contemporary hardware, yet resist adversaries equipped with both classical 
and quantum computers. Many signature schemes claim post-quantum secu- 
rity, some old and some new, but until now none has seen wide deployment. 
Recent research in post-quantum cryptography has revolved around the National 
Institute of Standards and Technology (NIST) Post-Quantum Cryptography 
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project [47], which will select a limited number of candidate schemes for stan- 
dardization. This process is currently in its third round; draft standards are 
expected by 2024. 


Post-quantum Security for Low-Power IoT. Let’s get back to the motto you can’t 
secure what you can’t update (securely). In our quest for post-quantum security, 
the first priority is to guarantee the legitimacy of software updates received via 
the network on low-power IoT devices. The crucial cryptographic tool here is a 
digital signature. Open standards targeting IoT security (such as the IETF [52]) 
specify a variety of signature schemes to secure software updates on low-power 
devices, including one scheme (LMS [41]) that offers quantum resistance. 


Implementation Approaches. Cryptographic implementations are often devel- 
oped to tackle specific problems, such as speed or size. Most implementations 
take advantage of special instructions or hardware, but this narrows their appli- 
cability to specific architectures, which does not fully reflect the reality of IoT. 
Usually, operating systems (OS) must support more than one architecture. 

Typically, new cryptographic algorithm implementations are demonstrated as 
stand-alone applications—a key first step in proving feasibility. But in practice, 
the OS does not have only the cryptography package: it has other modules, a 
network stack, and the kernel. 

Focusing on portability and wide deployment, our experimental work did not 
use any tuned assembly, or platform-specific instructions: we only modified the 
implementations to fit real-life conditions, such as those imposed by RIOT for 
our use-case (for example: not dedicating the entire stack to crypto). 


1.2 Contributions and Outline 
In this paper, we: 


— review the SUIT specification for secure software updates on low-power IoT 
devices, using its open-source implementation in the RIOT operating system 
as a case study; 

— show how crypto primitives including digital signatures and hash functions 
are used in compliance with SUIT; 

— analyze post-quantum considerations for SUIT-compliant hash functions, 
which we benchmark on low-power 32-bit microcontrollers; 

— survey post-quantum signature schemes, and derive a selection of schemes 
most applicable for the secure IoT software update use case; 

— benchmark signatures on heterogeneous low-power IoT hardware based on 
popular 32-bit microcontrollers (ARM Cortex-M, RISC-V and ESP32); 

— compare the performance of post-quantum signature schemes (LMS, Falcon, 
and Dilithium) against typical pre-quantum schemes (Ed25519 and secp256); 
and 

— conclude on the cost of post-quantum security, and outline perspectives for 
low-power IoT. 
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We begin with a survey of related work in Sect.2. In Sect.3, we set out 
our case study: we describe SUIT software updates, categorise typical software 
update types, detail pre-quantum cryptographic considerations and begin to 
identify the main issues for the transition to post-quantum cryptography. We 
focus on post-quantum signature schemes in Sect.4, explaining our choice of 
candidate schemes for benchmarking. Our experimental results appear in Sect. 5; 
we interpret their impact in the context of SUIT software updates in Sect. 6, 
before concluding in Sect. 7. 


2 Related Work 


The performance of pre-quantum digital signature schemes in the context of 
secure software updates on various Cortex-M microcontrollers is evaluated 
in [55]. Various NIST candidate post-quantum schemes are compared as compo- 
nent algorithms in TLS 1.3 in [51], analyzing performance, security, and key and 
signature sizes, as well as the impact of post-quantum authentication on TLS 1.3 
handshakes in realistic network conditions, while [38] shows a real life experi- 
ment with clients using two post-quantum schemes: an isogeny-based algorithm 
(SIKE) and a lattice-based algorithm (HRSS). More recently, another experi- 
ment with different schemes was conducted by Cloudflare [20, 49]. 

For pure post-quantum cryptographic implementation work targeting micro- 
controllers, [18] evaluates the performance of stateful LMS on Cortex-M4 micro- 
controllers, while pqm4 [35] aims to implement and benchmark NIST candi- 
date schemes on Cortex-M4, with M4 assembly subroutines plugged into some 
of the PQClean implementations. (Note that among the NIST candidate sig- 
nature schemes, PQClean implements only Dilithium, Falcon, Rainbow, and 
SPHINCS+; of these, pqm4 implements only Dilithium and Falcon.) Software 
verifying SPHINCS, Rainbowl, GEMSS, Dilithium2, and Falcon-512 signatures 
in Cortex-M3 using less than 8 KB of RAM is presented in [28]. 

Many post-quantum signature schemes use standard SHA3 hashing under 
the hood. SHA3 performance in hardware (FGPA) has been studied [29, 34, 36], 
but surprisingly few studies focus on SHA3 performance in software on low- 
power microcontrollers. Some prior work exists: [11] and [37] focus on 8-bit 
microcontrollers, while [30] compares the performance of Keccak variants on 
32-bit ARM Cortex-M microcontrollers. 


3 Case Study: Low-Power Software Updates with SUIT 


The IETF’s Software Updates for Internet of Things (SUIT) specifications [43, 
44] define a security architecture, standard metadata and cryptographic schemes 
able to secure IoT software updates, applicable on microcontroller-based devices. 
An open-source implementation of the SUIT workflow is available in RIOT [54], 
a common operating system for low-power IoT devices [10] which we use as base 
for our case study. 
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Fig. 1. SUIT secure software update workflow. 


3.1 SUIT Workflow 


Figure 1 shows the SUIT workflow. In the preliminary Phase 0, the authorized 
maintainer flashes the IoT device with commissioning material: the bootloader, 
initial image, and authorized crypto material. Once the IoT device is commis- 
sioned, up and running, we iterate a cycle of Phases 1-5, whereby the authorized 
maintainer can build a new image (Phase 1), hash and sign the corresponding 
standard metadata (the so-called SUIT manifest, Phase 2) and transfer to the 
device over the network via a repository (e.g. a CoAP resource directory). The 
IoT device fetches the update and SUIT manifest from the repository (Phase 3), 
and verifies the signature (Phase 4). Upon successful verification, the new soft- 
ware is installed and booted (Phase 5); otherwise, the update is dropped. 

The cryptographic tools needed for software updates in general, and SUIT 
in particular, are a digital signature scheme and a hash function. The digital 
signature authenticates (a hash of) the software update binary. 

We distinguish four broad categories for low-power IoT software updates, 
defining the following four prototypical use cases: 


— U1: Software module update (~5 KB) 

— U2: Small firmware update without crypto libraries (x50 KB) 
— U3: Small firmware update including crypto libraries (+50 KB) 
— U4: Large firmware update (+250 KB) 


We will see that the costs and recommendations for post-quantum SUIT are 
different for each of these typical updates. 


3.2 Security Features of SUIT 


The metadata and the cryptographic primitives specified by SUIT can mitigate 
attacks exploiting software updates [42]. To give three simple examples: 


— Tampered/Unauthorized Firmware Update Attacks: Adversaries may try to 
update the IoT device with a modified, intentionally flawed firmware image. 
To counter this threat, SUIT specifies the use of digital signatures on a hash 
of the image binary and the metadata, to ensure the integrity of both. 
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— Firmware Update Replay Attacks: Adversaries may replay a valid, but 
old (known-to-be-flawed) update. To mitigate this threat, SUIT metadata 
includes a sequence number that is increased with each new firmware update. 

— Firmware Update Mismatch Attacks: Adversaries may send an authentic 
update to an incompatible device. To counter this, SUIT specifies the inclu- 
sion of device-specific conditions, to be verified before installing a firmware 
image. 


3.3 Hash Functions with SUIT 


The metadata of the update (the SUIT Manifest [43]) includes a cryptographic 
hash of the software update binary. The SUIT standard specification [43] allows 
the use of SHA-2 or SHA-3, with 224-, 256-, 384-, or 512-bit output. 


Post-quantum Considerations. There are few quantum attacks against SHA- 
2 and SHA-3 in the literature. Grover’s algorithm may be parallelized to find hash 
preimages [12]; this attack applies to both Merkle-Damgard hashes (e.g. SHA-2) 
and Sponge-based hashes (e.g. SHA-3). For collision resistance, the state-of-the- 
art in quantum collision search does not drastically reduce the complexity with 
respect to classical algorithms [21]. On the other hand, classical attacks for SHA- 
2 might become a reality, as shown in [25]. 


Low-Power IoT Considerations. Low-power systems must run hash func- 
tions quickly, using as little power as possible; minimal memory (RAM and flash) 
usage is also desirable. In this context, since we aim for 128-bit security, the two 
functions we should consider for SUIT are SHA-256 and SHA3-256. 

Table 1 compares the memory usage and speed of three hash function imple- 
mentations on an ARM Cortex M4 microcontroller: RIOT’s default implemen- 
tation of SHA-256, a compact implementation of SHA3-256 optimized to min- 
imize flash memory, and an implementation of SHA3-256 optimized for speed 
on Cortex-M4 ARMv7M architectures. Stack is roughly equivalent across the 
different implementations, but speed and flash vary widely: SHA3-256 can offer 
slightly faster execution than SHA-256, but at the price of a 10x larger flash 
footprint. For a flash footprint similar to SHA-256, the comparative speed of 
SHA3-256 diminishes drastically for larger inputs. For more detailed analysis of 
different Keccak variants on microcontrollers, see [30]. 


Table 1. SHA2 and SHA3 performance on an ARM Cortex-M4 microcontroller. 


Hash function Flash (B) | Stack (B)| Time (KTicks) to hash 
64B 100B 1024B 10240B 
SHA-256 (RIOT OS) 1008 384| 277 278 1943 17933 
SHA3-256 Compact 1692 404 | 1336 1342 10402 98448 
SHA3-256 fast-ARMv7M 11548 284| 223 228 1672 15732 
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Conclusions. Based on our analysis, there are no direct post-quantum aspects 
to consider here. Rather, the choice hash function should be driven by low-power 
criteria, and by other indirect post-quantum aspects detailed below. Recall the 
four prototypical use cases from Sect.3.1. In U1 and U2, the updated software 
does not include the hash function implementation (the cryptographic tools are 
external, e.g., in a bootloader). In such cases, the flash memory overhead for the 
hash function is of no concern, and SHA3-256 (optimized for speed) is the best 
choice. In U3 and U4, however, the update includes the cryptographic tools and 
the hash function code; thus, a tradeoff appears. For small firmware updates as 
in U8, a 10 KB flash overhead represents a significant 25% bump in what needs 
to be stored on the device and transmitted over the network. As updates are 
infrequent, execution speed may be considered less of a priority, and thus both 
SHA-256 and flash-optimized SHA3-256 are valid options. For larger updates 
as in U4, the storage and transfer overhead is negligible, so speed-optimized 
SHA38-256 is the best option again. 

Let us now consider a complementary perspective: most post-quantum signa- 
ture scheme proposals use SHA-3 in their constructions. Indeed, candidates for 
the upcoming NIST post-quantum signature standard are required to be SHA- 
3/SHAKE compatible, because that is the current US standard. Since space for 
code on IoT devices is very limited, factorization is typically desirable: using a 
single hash function for both hashing and signing reduces the flash footprint. 

For these reasons, SHA3-256 is the primary choice in our case-study. 


3.4 Digital Signatures with SUIT 


The SUIT architecture relies on the software update distributor (the authorized 
maintainer in Fig. 1) issuing a long-term public-private key pair used to generate 
and verify digital signatures on IoT software updates. The public key is pre- 
installed on the IoT device(s) to be updated during commissioning (Phase 0). 

Digital signature use in SUIT is specified in the COSE standard [48], which 
defines how to sign and encrypt compact (CBOR) binary serialized objects. For 
the 128-bit classical security level, COSE specifies the elliptic-curve signature 
schemes Ed25519 and ECDSA on NIST P-256. These schemes offer very small 
public (and private) keys at 32B each, and 64B signatures. 

To give some concrete perspective, Table 2 shows the memory footprint of 
SUIT and related software components using Ed25519, compared to the whole 
software embedded on the IoT device. This measurement uses the open-source 
RIOT implementation on the Nordic nRF52840 Development Kit, a popular 
low-power IoT board based on an ARM Cortex-M4 microcontroller. The flash 
memory footprint of this firmware is 52.5 KB; the RAM (stack) usage is 16.3 KB. 

In this typical pre-quantum configuration, the crypto represents a small part 
of the flash footprint: under 15% of a ~50 KB total. The elliptic-curve signature 
adds 15% to the size of the SUIT manifest metadata and less than 0.1% to the 
data that must be transferred over the network, counting the manifest and the 
firmware binary as depicted in Table 2. 
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Table 2. Network transfer cost and decomposition of SUIT firmware update (for 
nRF52840 Dev Kit) using minimal metadata with Ed25519+SHA-256. 


SUIT OS firmware 
Metadata Signature Total | Crypto | Kernel Modules | Network Modules | OTA 
Size (B) 419 64 52485 | 7161 17039 20113 8172 


Post-quantum Considerations. Elliptic-curve schemes are advantageous 
because they provide high security guarantees even though keys and signatures 
are very small. However, the security of elliptic-curve signatures is guaranteed 
by the hardness of the elliptic-curve Discrete Logarithm Problem, which can be 
solved efficiently on large quantum computers using Shor’s algorithm [13,32,50]. 

It is important to note that a breakthrough in quantum computing at a time 
T will not affect the security of elliptic-curve signatures generated before T, but 
it would certainly destroy the security of any elliptic-curve signatures generated 
after T. In our use case, the distributor’s key pair has a very long planned 
lifetime, possibly equal to that of the devices to be updated; securely updating 
the key itself will be impossible, or at least undesirable. We therefore need to 
build-in resistance to the quantum threat in anticipation of such a development. 


Low-Power IoT Considerations. The range of post-quantum signature 
schemes considered as potential replacements for elliptic-curve signatures is 
wide and diverse, and the idiosyncrasies that distinguish the various schemes 
are exaggerated by the constraints of low-power IoT devices. However, all of 
these schemes have public key and signature sizes that are one or two orders of 
magnitude larger than the elliptic-curve equivalents. Post-quantum signatures 
are therefore far from drop-in replacements; they represent a significant research 
challenge for microcontroller and IoT implementations. 

Nevertheless, the IETF recently began standardizing alternative signature 
schemes with COSE/SUIT for post-quantum security, such as LMS [41]. In the 
next sections, we survey alternative quantum-resistant schemes, comparing their 
performance against state-of-the-art pre-quantum schemes in SUIT. 


4 Post-quantum Digital Signatures 


The signature schemes that we consider target at least NIST Level 1 for post- 
quantum security. This is the basic security level proposed by NIST as part of 
its Post-Quantum Cryptography (PQC) Standardization Project [47]. Level 1 
security includes both 128 bits of classical security, and an equivalent level of 
security with respect to some model of quantum computation. That is, an adver- 
sary should require on the order of 2!?8 operations to gain any non-negligible 
advantage when attacking the scheme, even if this adversary benefits from quan- 
tum computing power. The 128-bit security level is now standard in mainstream 
internet applications requiring long-term security. 
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4.1 Post-quantum Signature Paradigms 


We can classify the post-quantum signatures into the underlying hard problems 
that guarantee their security: 


Hash-Based Signatures. Hash-based signatures are among the oldest digital sig- 
nature schemes. Their security is based on the difficulty of inverting crypto- 
graphic hash functions. The security assumptions have been well studied, which 
gives an academic maturity to the problem. Hash-based signatures tend to offer 
very fast verification, though this comes at the cost of very large signatures. 


Lattice-Based Signatures. These schemes are based on hard problems in 
Euclidean lattices, and related problems like Learning With Errors (LWE). These 
schemes offer fast signing and verification, but have relatively large signatures. 


Multivariate Signatures. The security of “multivariate” schemes is based on the 
difficulty of solving certain low-degree polynomial systems in many variables. A 
recent analysis in [16] has brought their security levels into question. 


Isogeny-Based Signatures. Isogeny-based cryptosystems are based on the diff- 
culty of computing unknown isogenies between elliptic curves. Recent isogeny- 
based signature schemes such as SQISign [26] inherit small parameter sizes from 
conventional elliptic-curve cryptography (ECC), making them interesting for 
microcontroller applications, but they also inherit and increase ECC’s burden of 
heavy algebraic calculations, which makes for very slow runtimes. These signa- 
ture schemes have not yet been subjected to extensive security analysis. 


Code-Based Signatures. Code-based cryptosystems are based on the difficulty 
of hard problems from the theory of error-correcting codes. The McEliece key 
exchange scheme [40] is among the oldest of all public-key cryptosystems. Code- 
based signatures, on the other hand, are much less well-established. 


Zero-Knowledge-Based Signatures. A new category of post-quantum signatures 
uses Zero-Knowledge (ZK) techniques, combining algorithms from symmetric 
cryptography with a technique known as Multi-Party Computation In The Head. 


Summary. Table3 compares signature and key sizes, and maturity of security 
analysis of various post-quantum signature scheme proposals, summarizing the 
“pros” and “cons” of each paradigm according to our requirements. 


4.2 Selection of Candidates 


When choosing candidate signature schemes, we must consider key and signature 
sizes, runtime performance, and maturity with respect to security analysis. While 
the relatively compact parameters of some isogeny- and code-based signature 
schemes may make them interesting for future work targeting microcontrollers, 
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Table 3. Overview of post-quantum signature candidates. “Security analysis” reflects 
the maturity of analysis of the scheme: here we consider the age of the scheme, recent 


attacks, and how well-studied the underlying hard problem is. 

Paradigm Scheme Security Sizes (B) 
Analysis Signature Public Key Private Key 
Hash-based LMS [41] mature 4756 60 64 
SPHINCS+-128f [8]) mature 17 088 32 64 
Lattice-based Dilithium [9] less mature 2528 1312 2 420 
Falcon [27] less mature 1281 897 666 
MQ-based Rainbowl [23] not mature 66 157 800 101 200 
GeMSS [19] not mature) 417416 14520 48 
Isogeny-based SQISign [26] not mature 204 64 16 
Code-based WAVE [24] not mature 1625 ~13 000 000 N/R 
Zero-knowledge-based| Picnic3-L1 [22] | not mature 13 802 34 17 


at present these schemes are far from theoretical maturity. The true security level 
of the NIST multivariate and ZK-based candidates is a subject of current debate, 
though their extremely large keys and/or signatures would likely eliminate them 
from consideration for our applications in any case. 

The NIST PQC project has dominated research in post-quantum cryptogra- 
phy in recent years. Its candidate cryptosystems are a natural first port of call 
for credible post-quantum signature algorithms, since they have had the benefit 
of concerted analysis from the cryptographic community—especially the Round 
3 proposals, which are candidates for standardization in the coming years. How- 
ever, these are not the only algorithms that we should consider. For example, 
among hash-based signature schemes, we might compare the older LMS scheme 
(which is not a NIST candidate) with the newer SPHINCS+ scheme (which is 
a NIST Round 3 alternate). LMS has smaller computational requirements, but 
the signer must maintain some state between signatures; SPHINCS-+ is a heavier 
scheme, but it is stateless. Statelessness is an advantage for general applications. 
In our use case, however, statefulness is natural (it corresponds naturally to the 
version number on the software update), and easier to maintain—so the lighter 
LMS is a more natural choice. 


Post-quantum Choices. For the reasons above, we chose to focus our efforts on 
three post-quantum signature algorithms: LMS, Dilithium, and Falcon, repre- 
senting the hash-based and lattice-based categories. LMS has 60B public keys 
and 4756-byte signatures. Dilithium II, targeting NIST security level 2, has 
1312B public keys and 2420B signatures. Falcon-512, targeting NIST security 
level 1, has 897B public keys and 666B signatures. 


Pre-quantum Choices. To make a meaningful comparison with pre-quantum 
algorithms, we selected two elliptic-curve schemes: the Ed25519 [15,33] scheme, 
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and the historic standard ECDSA based on the secp256 curve [45]. These schemes 
offer particularly small 32B public keys and 64B signatures. 


5 Benchmarks 


5.1 Hardware Testbed Setup 


Our benchmarks were run on popular, commercial, off-the-shelf IoT hardware, 
representative of the landscape of modern 32-bit microcontroller architectures: 


— ARM Cortex-M4: the Nordic nRF52840 Development Kit provides 
a typical ARM Cortex-M4 microcontroller running at 64 MHz, with 256 KB 
RAM, 1 MB flash, and a 2.4 GHz radio transceiver compatible with both 
IEEE 802.15.4 and Bluetooth Low-Energy. 

— Espressif ESP32: the WROOM-32 board (ESP32 module with the 
ESP32-DOWDQ6 chip on board) provides two low-power Xtensa® 32-bit LX6 
microprocessors with integrated Wi-Fi and Bluetooth, operating at 80 MHz, 
with 520 KB RAM, 448 KB ROM and 16 KB RTC SRAM. 

— RISC-V: the Sipeed Longan Nano GD32VF103CBT6 Development 
Board provides a RISC-V 32-bit core running at 72 MHz with 32 KB RAM 
and 128 KB flash. 


IoT-Lab [6] provides this hardware for reproducibility on open access testbeds. 


5.2 Software Setup 
We used RIOT [5] as a base for our benchmarks. 


Pre-quantum Implementations. We used three different libraries, all currently 
supported in RIOT. 


Ed25519: For Ed25519, we used two libraries: C25519 (provided in [1]) and 
Monocypher [53]. Both contain constant-time finite-field arithmetic based 
on public-domain implementations [14]. One difference between Monocypher 
and C25519 is that Monocypher uses precomputed tables to speed up the 
computation of elliptic curve points. 

ECDSA: For ECDSA, we used Intel’s Tinyecrypt library [2], which is designed 
to provide cryptographic standards for constrained devices. ECDSA differs 
from Ed25519 both in some specific details of the signature algorithm and in 
using the NIST standard p256 curve instead of Curve25519. 


Post-quantum Implementations. We re-used publicly available code after making 
some small modifications to fit the hardware requirements. 
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LMS: For LMS, we used the Cisco implementation [3], removing calls to malloc 
since it can lead to memory fragmentation [39], and in such low level can be 
dangerous and slow. This change might lead to some small improvements 
in performance, since the kernel already knows the address at compile-time 
rather than only at runtime. For our benchmark, we used the smallest param- 
eters proposed in [41, Section 5]: that is, SHA-2 with 256-bit output for the 
hash function (since we tried to keep the code as close as possible to [3]) 
with tree height 5, and 32 bytes associated with each node. For the LMOTS, 
we use 32 bytes and 4 bits of width for Winternitz coefficients. We remove 
the OpenSSL call from the original code and change for a implementation of 
SHA256 provided in their repository [3]. Furthermore, we are using HSS with 
2 layers. These parameters satisfy the life cycle of updates: in particular, the 
key lifetime will never be surpassed by the amount of updates. 

Dilithium: We prepared two Dilithium implementations based on PQClean [4]. 

— Dynamic Dilithium is the basic PQClean implementation. The first 
step in signing and verifying is to expand a random seed given in the 
public key into a large matrix (cf. [9, Sec. 3.1]). 

— Static Dilithium modifies the PQClean implementation to precompute 
the matrix and store it in the flash memory. This makes signing and 
verification both faster, though it also requires more flash and reduces 
flexibility, since signatures can only be verified against the flashed key. 

Falcon: We used the Falcon implementation provided by PQClean [4], without 
any significant structural modifications. 


Parameter Sizes. Table 4 gives the sizes (in bytes) of the private key, public key, 
and signature for each of these schemes. 


Table 4. Key and signature sizes for benchmarked signature schemes. 


Algorithm | Private Key (B) Public Key (B) Signature (B) 

Pre-quantum Ed25519 32 32 64 
ECDSA p256 32 32 64 

Post-quantum Falcon 1281 897 666 
Dilithium 2528 1312 2420 

LMS (RFC8554) 64 60 4756 


5.3 Pre- and Post-quantum Signature Benchmarks 


Tables 5, 6, and 7 present our benchmarking results on our three target architec- 
tures: Cortex-M, ESP32 and RISC-V. For each implementation we give the total 


1 More details about dynamic allocation in embedded devices are available from 
https://github.com/RIOT-OS/RIOT/blob/master/CODING_CONVENTIONS. 
md. 
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flash memory used by the library, and for the signing and verification operations 
we list the running time in milliseconds and in thousands of “ticks” (computed 
from the hardware clock and time spent), and the stack required. 

We see that Monocypher’s Ed25519 is the fastest for signing among all the 
candidates, on all three boards. (Since the RISC-V board has only 32 KB RAM, 
the Falcon and Dilithium signing algorithms could not be run there.) Falcon 
offers the fastest verification on all three boards, followed by Static Dilithium. 


6 The Impact of Post-quantum in SUIT/COSE 


6.1 The Cost of Post-quantum Security 


How do post-quantum security costs compare to typical pre-quantum security 
costs? A toe-to-toe comparison between pre- and post-quantum signatures must 
consider public key and signature sizes, running time, and memory requirements. 

Table 4 shows that post-quantum algorithms always have larger public key 
and signature sizes, generally by well over an order of magnitude. Compared with 
standard elliptic-curve signature schemes, Falcon’s public keys are 28x larger 
and its signatures are 10.4 larger; Dilithium’s public keys are 41x larger than 
elliptic-curve keys, and its signatures are 38x larger. LMS avoids this spectacular 


Table 5. Signature benchmarks: ARM Cortex-M (nRF52840 Dev. Kit). 


Sign Verify 
. Flash Time Stack Time Stack 
Algorithm . a 
(B) | (ms) (KiloTicks) (B)|(ms) (KiloTicks) (B) 
Ed25519 (C25519) 5106| 845 54111 1180) 1953 125012 1300 
Pre-quantum Ed25519 (Monocypher) 13852 17 1136 1420| 40 2599 1936 
ECDSA p256 (Tinycrypt) 6498| 294 18871 1084| 313 20037 1024 
Falcon 57613 | 1172 75020 42240 15 1004 4744 
Dilithium (Dynamic) 11664| 465 29788 51762| 53 3407 36058 
Post-quantum 
Dilithium (Static) 26672| 135 8655 35240| 23 1510 19504 
LMS (RFC8554) 12864 | 9224 590354 13212| 123 7908 1580 


Table 6. Signature benchmarks: Espressif ESP32 (WROOM-32 board). 


Sign Verify 
: Flash Time Stack Time Stack 
Algorithm o. poa 
(B) | (ms) (KiloTicks) (B) |(ms) (KiloTicks) (B) 
Ed25519 (C25519) 5608| 921 73690 1312 | 2165 173205 1440 
Pre-quantum Ed25519 (Monocypher) 17238| 21 1709 1536| 60 4864 2160 
ECDSA p256 (Tinycrypt) 6869| 333 26696 1296| 374 29948 1216 
Falcon 60358 | 1172 93824 42504 16 1322 4920 
Dilithium (Dynamic) 12397| 87 7036 51954| 43 3508 36242 
Post-quantum 
Dilithium (Static) 27197| 121 9694 35412 21 1706 19620 
LMS (RFC8554) 15177 | 7583 606674 13488] 101 8141 1808 
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growth in public key sizes, with keys only 1.875 larger than elliptic-curve public 
keys; but its signatures are a massive 74.3x larger than elliptic-curve signatures. 

Looking at running time, as we saw in Sect. 5, post-quantum signatures have 
their advantages and disadvantages. Signature verification is considerably faster 
across all the IoT devices that we tested. Signing is generally slower. A compar- 
ison of the signing algorithms in Table5 shows that the fastest post-quantum 
algorithm runs in 135 ms, which is 7.94x slower than Ed25519 (Monocypher). 
But the tables are turned when we compare signature verification algorithms: 
The fastest pre-quantum algorithm runs in 40 ms, which is 2.65x slower than 
post-quantum Falcon. Efficient verification is a required and valuable feature (in 
all scenarios), but in this setting, it comes at the price of an increase in stack 
and flash memory. 

Looking at memory requirements, we see that post-quantum flash require- 
ments can grow to over 11x the smallest pre-quantum flash. Similarly, post- 
quantum algorithms impose a considerable increase in stack memory. 


Table 7. Signature benchmarks: RISC-V (Sipeed Longan Nano board). Falcon flash 
only contains the verification algorithm. Static Dilithium flash contains the verification 
algorithm and hard-coded public key. 


Sign Verify 
, Flash Time Stack Time Stack 
Algorithm 

(B) | (ms) (KiloTicks) (B)|(ms) (KiloTicks) (B) 
Ed25519 (C25519) 6024) 956 68883 1312 | 2242 161475 1440 
Pre-quantum Ed25519 (Monocypher) 17328) 16 1194 1376) 41 3013 1920 
ECDSA p256 (Tinycrypt) 7452| 270 19489 1224) 308 22192 1112 
Falcon 11122) — = —| 13 975 4756 
Dilithium (Dynamic) =| = = =| — a = 

Post-quantum 
Dilithium (Static) 25148) — = —| 17 1237 19572 
LMS (RFC8554) 15889 | 9105 655614 13352) 122 8808 1736 


6.2 The Cost of Post-quantum SUIT/COSE 


What is the footprint of quantum-resistant security, relative to typical low-power 
operating system footprints? As a concrete example: consider a firmware update 
for RIOT on the nRF52840dk. In the classification of Sect. 3.1, the update is 


— type U2, where the update does not include the cryptographic libraries binary 
(i.e., these tools are external, e.g., in a bootloader), or 
— type U3, where the update includes the cryptographic libraries binary. 


We want to add quantum resistance to SUIT/COSE by changing the crypto- 
graphic algorithms from Ed25519 and SHA256 to Falcon, LMS, or Dilithium, 
and SHA3-256. 
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Impact on the SUIT Manifest. In practical terms, the size of the SUIT 
manifest increases according to the new signature size. In Sect. 2 we saw that the 
SUIT manifest with pre-quantum Ed25519 (or ECDSA) has total size 419+64 = 
483B. Moving to post-quantum signatures, this total becomes 


— Falcon: 419 + 666 = 1085B, a ~2.24x increase; 
— Dilithium: 419 + 2420 = 2839B, a ~5.87x increase; and 
— LMS: 419 + 4756 = 5175B, a ~9.84x increase. 


Impact on SUIT Software Update Performance. Now consider the crucial 
aspect of network transfer costs, and the memory resources required to actually 
apply the firmware update on the IoT device. Table 8 uses our measurements to 
evaluate the relative cost of the entire SUIT software update process. We see 
that impact of switching to quantum-resistant security in SUIT varies widely in 
terms of network transfer costs, ranging from negligible increase (~1%) to major 
impact (3x more), depending on the software update use case. 


6.3 Post-quantum Signatures for IoT 


What are the potential alternatives for post-quantum digital signature schemes 
to secure IoT software updates? There are many possible deployments of IoT, 
and several possible scenarios for IoT software updates. It is safe to assume that 
the authorized maintainer, responsible for updating the firmware, has powerful 
hardware. Hence, the computational burden of signing is not the main concern 
here. On the other hand, a constrained device will be responsible for signature 
verification in Phases 3, 4, and 5 of the SUIT workflow in Fig. 1. 


Table 8. Relative costs for SUIT with quantum resistance (ARM Cortex M4). 


Data Transfer 

U2 U3 
base w. Ed25519 / SHA256 | 52.4KB 16.3KB| 471KB 53KB 
with Falcon / SHA8-256 +120% 418%) 41.1% +120% 
with LMS / SHA8-256 34% +1.2% 9% +43% 
with Dilithium / SHA 3-256 30% +210% | +4.3% +34% 


SUIT Flash Stack 


As we have seen above, the cryptography package does not run standalone in 
the board: it must coexist with several other modules (including kernel, network 
stack, and libraries), and the application itself. 

One challenge that we faced in deploying the schemes was sharing stack mem- 
ory (and SRAM memory). For example, on our RISC-V platform (recall Table 7) 
the total RAM memory budget available was 32 kB for the whole system—which 
is very small, but not an uncommon budget. We could not run Dilithium to sign 
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or verify within these limits, because it consumed all of the stack. In fact, we 
needed to adapt stack use for all of the post-quantum algorithms we used. 
Execution speed is another challenge. Slow signature verification may impact 
real-time applications if special care is not taken. Typically, on low-power IoT 
devices, there is no parallel computing. For instance, RIOT OS uses a preemptive 
multithreading paradigm, where a single thread is running at any given time. 
If signature verification takes a long time, running in a high-priority thread, 
then the system blocks on this task until completion. It is therefore necessary 
to carefully tune the priority of the crypto verification thread so as not to stop 
other functionally essential tasks, especially if signature verification is slow. 


6.4 Real-World Usability of Post-quantum Signatures 


Let us revisit the four prototypical software update categories from Sect. 3.1, 
and consider the choice of postquantum signatures for each. 

In use cases U1 (a small module update) and U2 (small firmware update 
without crypto libraries), the package contains the software update and the 
signature. Hence, speed and signature size are more important than flash size. 
In these cases, Falcon has an advantage over LMS and Dilithium. 

The use case U3 (small firmware update with crypto libraries) is more com- 
plicated, with flash playing a much more crucial role. Since we must transfer 
the update with crypto over a low-power network, the package size has a higher 
impact on energy costs. As a point of reference, it takes 30-60 s to transfer 50 KB 
on a low-power JEEE802.15.4 radio link, depending on link quality and network 
load (assuming non-extreme cases). This is to compare with plus-or-minus 2s 
of computation speed difference for signature verification among the candidate 
cryptosystems. In this case, as shown in Table 8, LMS presents the best tradeoff 
between flash size, network transfer costs, verification time, and stack size. 

In use case U4 (larger updates), the large network transfer costs overwhelm 
the other costs, reducing the comparative advantages of one post-quantum sig- 
nature over another. 

From the point of view of cryptographic maturity, LMS is the safest choice. 
As noted in Sect. 4.2, hash-based problems have received extensive cryptanalysis 
from the cryptographic community, while the security of structured lattice-based 
schemes like Falcon is less well-understood. Nevertheless, compared to the pre- 
quantum state of the art, LMS imposes a significant increase in signature size and 
running time, which has a major impact on SUIT performance. Thus, despite 
its relative lack of maturity, the performance characteristics of Falcon make it 
extremely tempting for applications with smaller updates. 


Deployment of Post-quantum Security. On a positive note: even though 
it necessitates increased data transfer, flash, and stack, post-quantum security 
can be deployed on today’s IoT hardware (i.e. tomorrow’s legacy hardware). In 
a nutshell: we can upgrade to quantum-resistant software update security on 
heterogeneous legacy IoT hardware without vast changes in portable C code. 


888 G. Banegas et al. 


It is clear that we will need to pay a price in the transition of pre-quantum to 
post-quantum algorithms. However, operating systems (for low powered devices 
such as RIOT) can already offer the tools to verify quantum-resistant signatures. 


7 Conclusion 


We have made an experimental study of the transition from pre- to post-quantum 
cryptography applied to securing software updates on low-power IoT devices, 
taking an open-source implementation of the IETF standard SUIT as concrete 
case study. We compare the performance of standard pre-quantum and selected 
post-quantum candidates for the required cryptographic schemes (signatures and 
hashing), in the same environment (RIOT) on three low-power IoT platforms 
(ARM Cortex-M, RISC-V, and ESP32) representative of the current landscape of 
32-bit microcontrollers. We show that upgrading from classical 128-bit security 
to NIST Level 1 post-quantum security is indeed achievable today on these 
platforms, and we derive recommendations based on our performance analysis. 
We also characterize the toll of the pre- to post-quantum transition on memory 
footprints and network transfer costs in the IoT software update process. 


Future Work. The priority remains to stabilize the current versions of post- 
quantum signatures before pushing their implementations to common low-power 
embedded software platforms such as RIOT. Meanwhile, NIST has yet to deter- 
mine the new post-quantum signature standard; should new candidates be 
included in a new call, more analysis will be necessary. 
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Abstract. Ring signatures and [D-based cryptography are considered 
promising in terms of application. A ring signature authenticates mes- 
sages while the author of the message remains anonymous. ID-based 
cryptographic primitives suppress the need for certificates in public key 
infrastructures (PKI). In this work, we propose a generic construction 
for post-quantum ID-based ring signatures (IDRS) based on symmetric- 
key primitives from which we derive the first two constructions of IDRS. 
The first construction named PicRS utilizes the Picnic digital signature 
to ensure its security while the second construction XRS is motivated 
by the stateful digital signature XMSS instead of Picnic, allowing a sig- 
nature size reduction. Both constructions have a competitive signature 
size when compared with state-of-the-art lattice-based IDRS. XRS can 
achieve a competitive signature size of 889 KB for a ring of 4096 users 
while the fully stateless PicRS achieves a signature size of 1.900 MB for a 
ring of 4096 users. In contrast, the shortest lattice-based IDRS achieves 
a signature size of 335 MB for the same ring size. 


Keywords: ID-based ring signature - Applied post-quantum 
cryptography - Symmetric-key primitives 


1 Introduction 


Ring signatures [28] are currently considered one of the most valuable crypto- 
graphic primitives to ensure privacy. They allow a member of a group (i.e. ring) 
to anonymously sign a message on behalf of a group in a spontaneous manner. 
This spontaneity allows signers to form a group of their own choice and to gen- 
erate an anonymous signature. According to their great promise in providing 
authenticity and anonymity, ring signatures have attracted a lot of interest from 
the research community. 


ID-based Ring Signature (IDRS): ID-based cryptography [29] was introduced 
in 1984 to erase the need for certificates in public key infrastructures (PKI). 
ID-based cryptography utilizes a public key which is the identity of the user, for 
example, an identity can be an email address or a name. In this framework, a 
trusted third party named the private key generator (PKG) is required. PKG 
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uses the identity of a user and his private key to generate the secret signing key of 
the corresponding identity. The interest in combining both [D-based cryptogra- 
phy and ring signature is undeniable as proven by the works [4,15,16, 20, 23, 32]. 
As explained by Chow et al. [16], the main advantage of ID-based Ring signa- 
ture (IDRS) over “traditional” ring signatures in PKI is that IDRS provides 
better spontaneity. PKI’s ring signatures can only form a ring with users that 
requested certificates for their public keys while in IDRS signers can form a ring 
using users’ identities even if they did not request their secret signing keys to 
PKG. Additionally, IDRS may significantly reduce the communication overhead 
in sending the list of ring public keys to the verifier along with the signature for 
PKI’s RS. The ring IDs may be much shorter and may even be implicitly known 
by the verifier (e.g. all employees of a certain organization/department). 


Post-quantum IDRS: In 2016, the post-quantum (PQ) standardization process 
was launched by NIST and generated considerable attention from researchers. 
This also motivates to design post-quantum IDRS as this would provide primi- 
tives which ensure anonymity without requiring any certificates. There exist dif- 
ferent PQ candidates to design quantum-safe cryptographic primitives. Lattice- 
based cryptography is currently the most investigated candidate due to its 
promise of flexibility. There are currently multiple lattice-based IDRS; the first 
one by Wang [30] and more recent works by Zhao et al. [33], Wei et al. [31] and 
Cao et al. [12]. To the best of our knowledge, those are the only quantum-safe 
IDRS. 

In this work, we will focus on another promising candidate: symmetric-key 
primitives, for example hash functions or block ciphers. These are old primitives 
providing the advantage of a well-studied and well-understood security. Another 
advantage is that the security of a symmetric-key-based protocol depends only 
on the integrated primitives and not on any assumed hard problem. This means 
that, if a symmetric-key primitive has been broken, it can simply be replaced 
and the design would still be secure. On the contrary, if the hardness assump- 
tion of lattice-based constructions has been broken, then all schemes relying on 
this assumption are not secure anymore. The recent design of zero-knowledge 
proof systems obtained from symmetric-key primitives opens up new directions. 
Zero-knowledge proof systems as ZKBoo [21], ZKB++ [13], KKW [26], ZK- 
STARK [6], Aurora [7] and Ligero++ [9] allow a user to prove the knowledge 
of a secret witness w such that C(w) = 1, where C is a public circuit similar to 
hash functions. For all these aforementioned reasons, our work studies PQ IDRS 
constructed based on symmetric-key primitives only. 


1.1 Contributions 
The contributions of this work can be presented in the following three parts: 


Generic Post-quantum ID-Based Ring Signatures (Sect. 3): We designed 
a circuit C (see Sect.3.1) to allow signers to execute a zero-knowledge proof 
that they own a witness w such that C(w) = 1. C is divided into two sub- 
circuits: C1 and C2. C1 proves the membership of the signer to the ring through 
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Table 1. PicRS and XRS comparisons. PQ = post-quantum, N = ring size, V = proven, 
(Vv) = assumed, h = XMSS tree height 


IDRS | PQ jo| Max. N | IDRS.Setup |SID] | NIZK PQ |H (est.)|o|(MB) 
candidate | (Asympt.) (Asympt.) time | (KB) N =2°|N =2"|N = 270 
39.96: ).153 | 170.406 
KKW z LowMC | 169.964 | 170.153 | 170.406 


SHA3 3619 3622 3626 
PicRS | Symmetric | O(log N) | unli. O(1) 167 SHA3 2.046 | 2.046 2.047 
Ligero++ | (Vv) | MiMC 1.902 | 1.902 1.903 
Poseidon | 1.898 1.899 1.900 


LowMC | 12.487 | 12.680 | 12.930 


KKW v 
SHA3 332.300 | 335.266 | 339.211 
XRS |Symmetric | O(log N) | 27° o(2") 4.899 SHA3 |1.490 |1.491 | 1.493 
Ligero++ |(Vv)|/MiMC 0.973 | 0.976 0.979 
Poseidon | 0.885 | 0.889 0.893 
[33] | Lattice O(N) unli - 615000 | - v l- 5 335 32243 


an accumulator (see Definition 4) and C2 proves the knowledge of a signing 
secret key generated by the private key generator PKG. In our generic IDRS 
construction, the signing private key for the identity ID is a digital signature of ID 
generated by PKG, therefore C2 proves the knowledge of a valid digital signature. 
Both Cı and C% are linked through an “AND” logical gate (C = C-C2). The 
generic circuit is illustrated in Fig. 1. 


Applicable Post-quantum ID-Based Ring Signatures Named PicRS 
and XRS from the Generic Construction (Sect. 4): We implemented sub- 
circuit Cı with a Merkle Accumulator (Sect.4.1) to prove the membership to 
the ring. sub-circuit C2 can be initiated in two different ways: 


(1) PKG uses Picnic digital signature [13], which means that a signer needs to 
prove the knowledge of a valid Picnic signature. We designed circuit Picnic.C 
presented in Sect. 4.2, Algorithm 1 and Fig. 3 to prove this statement. 

(2) PKG uses the stateful digital signature XMSS [24], which means that a 
signer needs to prove the knowledge of a valid XMSS signature. We designed 
circuit XMSS.C presented in Sect. 4.2, Algorithm 2 and Fig. 4 to prove this 
statement. 


Picnic.C’ allows to design the first IDRS PicRS, where a signer uses circuit C = 
PicRS.C = Merkle.C (= C})-Picnic.C(= C2) to generate a signature. The second 
implementation named XRS ensues from circuit XMSS.C. In XRS, a signer uses 
circuit C = XRS.C = Merkle.C (= C1) - XMSS.C(= C2) to generate a signature. 
While PicRS is stateless for the signer and PKG, XRS is still stateless for the 
signer but requires PKG to keep an updated state to the stateful nature of 
XMSS. The PicRS’s PKG can generate an unlimited number of signing secret 
keys and therefore handle an unlimited number of users, while there is a cap for 
the maximum number of users in XRS (e.g. 27° users). 


Applicable Constructions Analysis and Optimization (Sect. 5): We eval- 
uate our constructions with two different zero-knowledge proof systems: KKW 
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[26], Ligero++ [9]. Each of them is tested with the standard hash function SHA3 
and, additionally with the following non-standard hash functions to optimize the 
signature size: LowMC [3], MiMC [2] and Poseidon [22]. We optimize the com- 
plexity of circuit Picnic.C and XMSS.C by testing different parameters for Picnic 
and XMSS to achieve the best possible signature size. In theory, both schemes 
have a signature size that grows logarithmically (O(log N)) with the size of the 
ring represented by N. This is an improvement when compared with lattice- 
based IDRSs [30,31,33] or [12], whose signature size grows linearly (O(N)) with 
the ring size, making them unsuitable for large rings. In practice, PicRS and XRS 
signature sizes are nearly constant because proving the knowledge of a valid Pic- 
nic (PicRS) or XMSS (XRS) signature is the bottleneck of both signature sizes. 
PicRS achieves a size of 1.900 MB while XRS requires only 889 KB for a ring of 
4096 members. Table 1 demonstrates that these sizes are competitive when com- 
pared to the current state-of-the-art of lattice-based IDRS introduced by Zhao 
et al. [33], which is the only work proposing concrete parameters and allowing 
us to estimates their signature sizes. 


1.2 Overview of Techniques 


At the heart of our generic IDRS, we utilize a non-interactive zero-knowledge 
proof system (NIZK) based on symmetric-key primitives, which allows us to 
prove the knowledge of a witness w such that, for a public circuit C, we have 
C(w) = 1. Traditional digital signatures, for example Picnic [13], run the NIZK 
on a circuit related to the underlying one-way function as in the original zero- 
knowledge proof based signature schemes like Picnic [13] New challenges arise in 
IDRS as distinct from a traditional digital signature, namely, the generated proof 
needs to include a part of the IDRS signature involves the “verification” circuit 
for the signing secret key generation (i.e. verification algorithm of a standard 
signature by the key generation authority). This means that optimising the size 
of the verification circuit for the underlying signature is critical for our IDRS 
signature size and we focused our efforts in this direction. 

In our generic IDRS construction, a signer with an identity ID owns a witness 
that is the signing secret key SID. SID is a digital signature (see Definition 2), 
generated by PKG, using its ID as a message. A signer will use the NIZK proving 
procedure on circuit C to generate a signature of a message m. C is designed 
to prove that he knows a valid SID, in other words, a valid digital signature 
generated by PKG for an ID belonging to the ring L. Circuit C takes as inputs 
(i.e. the witness) the signer’s identity ID, the corresponding signing secret key 
SID and the list of identities L (i.e. the ring). More formally, the signer will 
prove the knowledge of (ID,SID,L) such that C(ID,SID,L) = 1. C, summarized 
in Fig. 1, is composed of two main sub-circuits named Cı and C2, which are 
used to prove the membership in the ring and to prove the knowledge of a valid 
digital signature. 

We construct applicable constructions by defining and optimizing circuit C 
and both its sub-circuits C1 and C2. Cı is implemented as Merkle.C’ which is 
constructed on top of a Merkle accumulator [10,17,26] and a multiplexer (see 
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Sect. 4.1) to hide the position of the identity into the accumulator. C2 can be 
implemented with the verification procedure of the Picnic digital signature or 
the verification algorithms of the stateful XMSS digital signature. XRS follows 
the same idea but, instead of having a valid picnic signature as SID, each user 
owns an XMSS signature. The proof of knowledge of a valid signature will be 
done through circuit XMSS.C(= C2). As the stateful nature of XMSS, XMSS.C 
requires the use of multiplexers (see Eq.1) to hide the state and, so, provide 
anonymity. 


1.3 Outline of the Paper 


This paper is structured as follows. Section 2 formally defines IDRS. Section 3 
presents the generic construction. Section 4 introduces both possible instances 
PicRS and XRS. Section 5 concludes the paper with a full evaluation of the two 
applicable constructions. Appendix A defines the cryptographic primitives used 
in this work. 


2 Definition of ID-Based Ring Signature (IDRS) 


We now formally define an ID-based ring signature. In IDRS, there is a private 
key generator (PKG), which is a trusted identity generating the signing secret 
keys of users. Only users who have received a signing secret key SID from PKG 
can generate a valid and anonymous signature. 


Definition 1 (ID-based ring signature). An ID-based ring signature 
is defined by the tuple of algorithms: IDRS = (IDRS.Setup, IDRS.KeyGen, 
IDRS.Sign, IDRS.Verify) 


- (mpk, msk, param) — IDRS.Setup(1*): This algorithm takes as input the secu- 
rity parameters A, it produces the master public key mpk, the master secret 
key msk and the public parameters param. This procedure is executed by the 
private key generator (PKG). 

- SID — IDRS.KeyGen(ID, msk): This algorithm takes as input an identity ID € 

{0,1}* and the master secret key msk, it outputs the signer’s secret signing 

key SID. This procedure is executed by the private key generator PKG and the 

result is transmitted to the user with the identity ID. 

a — IDRS.Sign(m,L,ID, SID, mpk, param): This algorithm takes as input the 

message m, alist L of N identities, the identity of the signer ID, the signing 

secret key SID of the member ID, where ID € L, the master public key mpk 

and the public parameters param. It outputs a ring signature o. 

- 0/1 — IDRS.Verify(m,L,o, mpk, param): This algorithm takes as input a ring 
signature o, a message m, the ring list L, the master public key mpk and 
the public parameters param. It outputs 1 if o is valid and generated by one 
ID EL, 0 otherwise. 


A secure [D-based ring signature achieves unforgeability and anonymity which 
are defined in the full version [11]. 
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3 Generic Construction for [D-Based Ring Signature 
from Symmetric-Key Primitives 


This section introduces our proposed generic construction of IDRS based on 
symmetric-key primitives. A key part of our proposal is that symmetric-key 
based zero-knowledge proof systems (NIZK) give us the ability to prove the 
knowledge of an input (i.e. witness) w of a circuit C such that C(w) = 1. In an 
IDRS, the signer needs to demonstrate that (1) he owns a secret key generated 
by the central authority PKG and that (2) his identity belongs to the rings. This 
requires a circuit C that proves both statements. 


L ID SID 


C IDEL [è 1= Ds.Verify(1, SID, mok) 


Fig. 1. Generic circuit C 


The basic idea behind our generic IDRS is that the PKG possesses a digital 
signature (see Definition 2) key pair as master public mpk and private key msk (i.e. 
mpk = DS.pk, msk = DS.sk). Then, each user with an identity ID requests a signing 
key SID to PKG which generates a digital signature DS.c taking the identity ID as 
the message or, in other words, PKG computes SID — DS.Sign(ID, msk). To sign 
a message m, a user in possession of a signing secret key generated by PKG proves 
the knowledge through a NIZK (see Definition 3) of SID associated with an identify 
ID belonging to the ring L (i.e. ID € L). We designed a generic circuit C (see Fig. 1) 
which proves the validity of both statements. C is divided into two sub-circuits” 
Cı to prove ID € L and C2 to prove 1 = DS.Verify(ID, SID, mpk). Cı and C2 are 
associated with “AND” gate to form the overall circuit C. 


3.1 Generic IDRS Algorithms 


We now formally define the algorithms for our generic construction of IDRS, 
which follows Definition 1. 


(mpk, msk, param) — IDRS.Setup(1*) : This algorithm is executed by PKG and 
performs the following steps: 
— (mpk, msk) — DS.KeyGen(1>) (see Definition 2) 
— param +— A.Gen(1*) (See Definition 4) 
— Return (mpk, msk) 
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SID — IDRS.KeyGen(ID, msk): This algorithm is executed by PKG on the request 
of the user with the identity ID. It computes a digital signature DS.c using 
the ID as the message. The digital signature DS.o is transmitted to the user 
ID with and becomes his signing secret key SID. This procedure executes the 
following steps: 

— SID — DS.Sign(ID, msk) (see Definition 2) 

— Return SID 

o — IDRS.Sign(m,L,ID,SID, mpk, param): This procedure takes as inputs the 
message m, the set of N identities L, in other words the ring, the signing 
secret key SID = DS.c and the master public key and the public parameters 
param which the initial public key of an empty accumulator. This executes 
the following steps: 

— (A, A.pk) <— A.Eval(param,L): This “accumulates” the set of identities 
belonging to the ring L. It returns an accumulator A and its updated 
public key A.pk. 

— wip + A.WitGen(A.pk, AL, L, ID): This returns the witness wip for the 
identity of the signer which will be used to prove that his ID is included 
in the accumulator AL. 

— ma — NIZK.Prove((m, A.pk, AL, mpk), (SID, ID, wip)) (see Definition 3) The 
secret witness is w = (SID,ID, wp), the public statement is composed of 
the message m, the accumulator public key A.pk, the accumulator build 
on the ring AL and PKG public key mpk (x = (m, A.pk, AL, mpk)). The 
tuple (z,w) € R if and only if the following statements stand: 

(1) 1 = A.Verify(A.pk, AL, wip, ID): This is equivalent to prove ID € L, so 
it will be proven through sub-circuit C4 (See Fig. 1). 

(2) 1 = DS.Verify(ID, SID, mpk): This will be proven through sub-circuit 
Cə (See Fig. 1). 

Both statements will be separately proven through the sub-circuits C1 and 

Cə which are linked together with a “AND” gate to form the whole circuit 

C ((See Fig. 1)) and assure that they are both valid. The message to be 

signed, m, is embedded by integrating it to the Fiat-Shamir transform! 

[19] to generate the challenge. 

-0T 

- Return o 

0/1 — IDRS.Verify(m, L, o, mpk, param): This algorithm takes as inputs the mes- 
sage m, the list of identities L and a ring signature ø. It verifies the validity 
of o by executing the following steps: 

-T0 

— (AL, A.pk) — A.Eval(param, L): The verifier construct the accumulator for 
the set of identities belonging to the ring L. It returns an accumulator AL 
and its updated public key A.pk. 

— Return NIZK.Verify((m, A.pk, AL, mpk), 7): This returns 1 if the proof m 
is valid for circuit C. 


1 Fiat-Shamir transform converts an interactive protocol into a non-interactive proto- 
col. It generates the challenge c as an outputs of H (c = H(r,m)), where m is the 
message to be signed in IDRS, instead of receiving it from the verifier. 
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3.2 Security Analysis 


The security of our generic IDRS depends on the symmetric-key primitives used, 
namely a EU-CMA secure digital signature scheme DS (see Definition 2), a secure 
accumulator A presented in Definition 4, a cryptographic hash function H and 
a secure NIZK NIZK (see Definition 3). All security proofs are presented in the 
full version of the paper [11]. 


Theorem 1 (Unforgeability). Let IDRS be the construction provided in 
Sect. 8.1 with a cryptographic hash function H, a EUC-CMA secure digital sig- 
nature scheme DS, a secure accumulator A and a secure non-interactive zero- 
knowledge proof system NIZK. Then, IDRS achieves unforgeability. 


Theorem 2 (Anonymity). Let IDRS be the construction provided in Sect. 3.1 
with a cryptographic hash function H, a EUC-CMA secure digital signature 
scheme DS, a secure accumulator A and a secure non-interactive zero-knowledge 
proof system NIZK. Then, IDRS achieves anonymity. 


4 IDRS: Applicable Constructions 


This section introduces possible applicable constructions of the generic IDRS 
presented in Sect. 3. Section 4.1 starts with a presentation of the practical con- 
struction of sub-circuit C4 and discusses its security while Sect. 4.2 is dedicated 
to presenting possible constructions of sub-circuit C and also discusses their 
security. The section ends with the summary of two possible implementations of 
the generic IDRS. We assume that hash functions H output a string of 2A bits 
where A is the post-quantum security level. 


4.1 Sub-circuit Cy 


Sub-circuit Cı (see Fig.l) aims to prove the first statement 1 = 
A.NVerify(A.pk, AL, wip, ID) which is equivalent to prove ID € L or, in other words, 
that the identity ID of the signer belongs to the ring L. As previously stated, 
we use an accumulator to prove the membership to the ring. Applying NIZK 
based on symmetric-key primitives requires that the verification of a valid wit- 
ness, A.Verify algorithm, can be expressed as a one-way circuit. This leads us to 
circuit Merkle.C, derived from the Merkle accumulator. 


Merkle Accumulator and Circuit Merkle.C: The Merkle accumulator 
respects Definition 4 and “accumulates” the set of identities L as a Merkle tree 
[27]. Each identity of the ring L is a leaf in a Merkle tree. A Merkle tree is a 
binary tree in which each internal node is the hash of both its children. The 
accumulator’s public key A.pk is the tree root as illustrated in Fig. 2a. 

The membership proof consists of demonstrating the knowledge of the path 
from the leaf associated with the signer identity to the root of the tree. This can 
be represented as a circuit Merkle.C (wip, ID) = A.pk, where A.pk is the Merkle 
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root and wip is the authentication path for the identity ID composed of internal 
nodes of the Merkle accumulator/tree and log N bits, which indicate the direc- 
tion of the path. Merkle.C is formally presented in Algorithm 2 and is composed 
of log N calls of hash function H, where N is the ring size. Additionally, at each 
level of the Merkle accumulator, Merkle.C goes through a multiplexer u which 
is defined as follows: 


_ J(a,y) ifb=0, 
(2,9, b) = te ifb=1. (1) 


p orders the inputs of H depending on the path coming from left or right in 
the tree. hides the path’s direction from ID to the root A.pk, hence ensuring 
anonymity to our IDRS. u can be written as a circuit p(a,y,b) = b- x + 
b-y,b-y+b-2). Figure 2a depicts an example of a Merkle accumulator and 
Merkle.C: the corresponding witness for ID is wip = ((w1, 1), (w2, 0)), where 
the Merkle.C(wip, ID) = H(w(H(u(ID, wi, 1)), we, 0)). Without the multiplexer, 
the position of the ID would be identifiable from the path meaning that the 
anonymity could not be provided. Merkle.C is presented in Algorithm 2. 


A. pkj 
4 
[071 H 
i Wiog N 
i 
0/1 
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H 
apy Caps ~~ 8A A 
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(a) Example: Merkle accumulator (b) Merkle.C repre- 
with N = 4, n1 = H(ID2, 1D) sentation 


Fig. 2. Merkle Accumulator and circuit Merkle.C 


In conclusion, Merkle.C plays the role of sub-circuit C1 in our general circuit 
(see Fig. 1) to prove that the membership to the ring can be replaced in appli- 
cation by circuit Merkle.C’ presented in this section. The complexity of Merkle.C’ 
grows logarithmically with the number of identities in the ring. 


Security and One-Wayness Merkle.C: As the security presented in Sect. 3.2 
shows, we require a secure accumulator A, in our case, a secure Merkle accumu- 
lator which comes from the properties of the hash function H (see Definition 5). 
The collision-freeness of the accumulator comes directly from the collision resis- 
tance property of H. This means that it is computationally infeasible to con- 
struct the same accumulator from two different sets of ring members. Merkle.C’s 
one-wayness ensues from the one-wayness of the hash function H. This circuit 
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is a Merkle tree with a public root. It should be computationally infeasible to 
recover any of the leaves from the root and this is achieved because the root is 
an output of the hash function H, which is a one-way function. 


4.2 Sub-circuit C2 


Cə (see Fig. 1) aims to prove the validity of signing secret key SID, namely to 
prove that 1 = DS.Verify(ID, SID, mpk). Therefore, the verification procedure 
of the digital signature scheme is to be expressed as a one-way circuit. The 
current state-of-the-art of digital signatures based on symmetric-key primitives 
gives us two possible digital signatures: The stateless Picnic signature and the 
stateful XMSS signature, which meet the requirements to fulfil Theorem 1 and 
2. Both schemes were chosen because they are considered as standard post- 
quantum signature (XMSS) or as alternative candidate (Picnic) by the NIST 
standardization process. It is important that it exist other alternatives [1]. We 
provide the pseudo-code of both circuits Picnic.C and XMSS.C in Algorithm 1 
and 2 to present a clear evaluation later in this work. A more detailed explanation 
of both circuits is available in the full version of the paper [11]. 


SID = (ne {seed}, h;'} flstateji ru);  COMjipj {2}; ar MSS, ;} ) 
jee 


jec’ j+pj 
Picnic. cy Picnic. c3 Picnic. c3 
flstateji tii] jen pin) } {seed} Jee (Bamsaso) 
d jee 
H H | 
H Bs. = 2a [Ag] ® 2 «al ® [Ac] ® [Ay] ® Za “2 
| H |= ||| Ee H H 
H I | 
| pa Tii Sreeja Tn h 
hy H H 
H 
T 
hy 
Picnic. c4 | 
pH G ice 
P,C= 


Fig. 3. Circuit Picnic.C 


Circuit Picnic.C: The goal of this circuit is to prove that the signer possesses a 
valid Picnic signature generated by PKG. Picnic and a detailed description of its 
parameters are presented in the full version of the paper [11]. More formally, the 
signer proves the knowledge of SID such that 1 = Picnic.Verify(ID, SID, mpk) with 
the NIZK.Prove procedure and circuit Picnic.C’. Circuit Picnic.C takes as inputs 
SID = (C,P, {seed}, hi }jec, {{state;,, Ti j }itp;, COMs,p;, {ĉja}, MSS; pi }yec) 
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(the Picnic signature) and the signer identity ID. To facilitate the understanding 
of the circuit, we present a high-level picture of Picnic.C in Fig. 3 and we divide it 
into four sub-circuits Picnic.c,, Picnic.cg, Picnic.cz and Picnic.cy. Each sub-circuit 
executes a specific step of the Picnic verification procedure and is presented in 
Algorithm 1. 


One-Wayness and Security of Picnic.C: The unforgeability of Picnic have 
been proven in [13,26] and therefore provides the desired security according to 
Theorem 1 and 2. The one-wayness of Picnic.C’ ensues from the one-wayness 
of the four sub-circuit presented in Algorithm 1. The one-wayness of Picnic.c; 
depends on the one-wayness of the cryptographic hash function H. Indeed, each 
step involves only the calls of the hash function H therefore it is computation- 
ally infeasible to invert Picnic.c,. Picnic.cz provides also one-wayness for the same 
reason. Picnic.cz one-wayness depends on the one-wayness of the following equa- 
tion (ĉa - [Av] © 25+ [Aa] @ [Ac] © [Ay] ® ĉa * 2) on which the NIZK is executed. 
This six inputs equation has a 50% chance to output 0, which makes it one-way. 
Picnic.c4’s one-wayness follows directly from the one-wayness of hash function G. 


Algorithm 1. Picnic.C circuit, see [11] for parameters’ description 


SID = (C, P, {seed}, hi, hide {{state;,i}izp;,COMj,p;,{2j,a},™S98;,p; }jec) 


1: for j € C do 
Execute C, where C is the circuit used 
for the Picnic signature. 
for all {Pi}izp; do 
for x € [AND] do 
(2a [Av] © 2 « [Aa] @ [Ac] & [Ay] 4 
q+ p)G%*) (see Fig. 3) 


Picnic.cy 

Input: SID 2: 

Output: {hy}iec 

1: for j € C do 3 

2: hj = H(A (statej,1,7j,1),---,comj,p,;» 4: 
.., H(statejn,1j,n)) 5: 

3: end for 


4: return {hj}jec 6: end for 
Picnic.c2 ‘ fe T N 
Input: SID : 1 = H({2;},msgsj,1,---,™s9s;,n) 
Output: {h;} 9: end for 
7 15 55 EC f at 

1: for j ¢ C do 10: return {hi}jec 
2: {seed }n, = Tree(seed*,n) (See Picnic-ca . 

Table 2) Input: {hj}jec, {hi }iec, {h;, hi} see: ID 
3: (state;;,7ri,7) < ComputeState(seed) Output: b 


1: b= (C, P) = GUID, hi, hh... har, hhg) 
2: return b 


(See Table 2) 
4: hj = H(H(statej1,rj 1), ---, 


H(statej.n,1j,n)) Picnic.C 
5: end for Input: SID,ID 
6: return {hj}j¢c Output: 0/1 


Picnic.c3 
Input: SID 
Output: b 


1: b = Picnic.c4(Picnic.c; (SID), 
Picnic.cə (SID), Picnic.c3 (SID), ID) 
2: return b 
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This important to notice that in Picnic, G computes the challenge (Fiat-Shamir 
transform) and is modelled as random oracle. Hence, producing NIZKs for calls 
to a random oracle which is not possible, but in practice G is initialized as an 
hash function which makes PicRS possible. 


Table 2. Additional Functions and parameters summary 


r MT(S) This circuit constructs a Merkle root r of binary 
tree from a set of leaves S. This circuit executes 
H for |S| — 1 times 

gleaves 


{leafi};_-;  +— Tree(seed, depth) | This circuit takes as input a seed seed and an 
integer depth indicating the number of the binary 
tree leaves. This procedure calls 2 - (leaves — 1) 
executions of H 


(state, r) — ComputeState(seed) | This circuit takes as input a seed and outputs 
the tuple (state, r). This circuit executes one H 


the multiplication/“AND” gates in circuits 


+ the addition/“OR” gates in circuits 


XMSS.C Circuit: We define circuit XMSS.C that will be used to prove the 
knowledge of a valid SID such that 1 = XMSS.Verify(ID, SID, mpk). We divided 
circuit XMSS.C into two sub-circuits XMSS.c; and XMSS.cz presented in Algo- 
rithm 2 and presented in more detail in the full paper [11]. It is crucial to high- 
light lines 1 and 3 in Algorithm 2 for sub-circuit XMSS.cg. These lines execute 
the XMSS path verification but we added a multiplexer (the same as the one 
used for the Merkle accumulator presented in Sect. 4.1). This addition ensures 
the anonymity as it hides the position of the WOTS*.pk in PKG’s XMSS tree, 
as such providing unlinkability between two signatures generated by the same 
signer. To ease the understanding, we also provide a high-level representation of 
XMSS.C in Fig. 4. 


= XMSS.C 
E 
S XMSS. c XMSS. c2 
b idx, idx, 
Sa 
S A a. | | 
H x 
$ NaH /" i T z u ~H u H & 
t H| = 
em gf lL 
£ H —H 5 
a|, # Ss auth, auth, 
Ii 
o | 
A H(ID,r) 
S 
o 
= sp= (WOTS*.o, auth, idx) auth = (auth, auth) , idx = (idx,, idx2) 


Fig. 4. XMSS.C 
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One-Wayness and Security of XMSS.C: XMSS.C is composed of two sub- 
circuits presented in Algorithm 2. The one-wayness of XMSS.C ensues directly 
from the unforgeability of XMSS used by PKG and of the one-wayness of the 
hash function. Indeed according to [24], the security of XMSS depends on the 
one-wayness of the hash function, it is computationally infeasible to compute 
the WOTS? public key from the XMSS tree root which is mpk. As it is compu- 
tationally infeasible to invert XMSS.cj from mpk and also infeasible XMSS.co. 
The information leaked by the stateful nature of XMSS is hidden by integrating 
the multiplexer u to circuit XMSS.cp. Because of the multiplexer, the position of 
the WOTS? signature in PKG’s XMSS tree is hidden and avoids the possibility 
to link two signatures generated by the same signer. 


4.3 Applicable Post-quantum IDRSs from Symmetric-Key 
Primitives 


This section introduced three possible circuits: Merkle.C, which can take the role 
of C1, Picnic.C and XMSS.C, which can both be implemented as C2. From these 
circuits, we can implemented two different IDRSs. 


Post-quantum IDRS from Picnic Named PicRS: PicRS follows the generic 
construction presented in Sect.3.1 and uses Picnic as digital signature DS. This 


Algorithm 2. XMSS.C circuit, see [11] for parameters’ description 
SID = (WOTS*.c, idx, auth) 


XMSS.ci 2: for 1 <i<hdo 
Input: ID, WOTSt.o 3: hash; = H(p(hi-1, authi, idx;)) 
Output: WOTS?.pk’ 4: end for 
1: md = (mdi, .. . , mdien,) — H(ID, r) 5: return hashr == mpk 
2: c= (mdi, ... , Mdina) — X (Wint— XMSS.C 
1 — mdi) Input: SID 
3: md = (md||c) Output: 0/1 
4: fori<i< len do 1: b = XMSS.c2(XMSS.ci(SID)) 
5: WOTS* .pk; = 2: return b 
Ame" (WOTS™ oi) Merkle.C 
6: end for log N 


Input: wip = (wi, bi) 28, 
Output: 0/1 

1: ay = A(u(ID, wi, 61)) 

2: for i = 2 to i = log N do 


7: WOTS*.pk’ = MT({WOTS* .pk; }'<2, ) ov 


(see Table 2) 
8: return WOTS*.pk’ 


ASS 3: ai = H(u(ai—1, wi, bi)) 
Input: WOTS*.pk’, idx, auth 4: end for 
Output: b 5: return A.pk = alog N 


1: hı — H(pu(WOTS*.pk, authı, idx1)) 
{Where u is a multiplexer as described 
in Sect. 4.1 and presented in 1} 
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means that the master public and private keys mpk and msk are set to msk = 
Picnic.sk and mpk = Picnic.pk and each user with an identity ID owns a signing 
secret key SID = Picnic.o such that 1 = Picnic.Verify(ID, SID, mpk). In PicRS, 
the general circuit C (see Fig. 1) named PicRS.C is where the signer executes 
the NIZK proof. We have circuit C = PicRS.C where C = PicRS.C = Merkle.C - 
Picnic.C. PicRS security comes from the EU-CMA security of Picnic proven in 
[14], from the properties of the Merkle accumulator and from the one-wayness 
of circuits Merkle.C and Picnic.C discussed in Sect. 4.3. 


Post-quantum IDRS from Picnic XMSS Named XRS: The idea of the 
XMSS-based IDRS construction is similar to the PicRS construction but the 
Picnic digital signature is replaced with the XMSS signature. This means that 
in XRS, each signer executes NIZK on circuit C = XRS.C = Merkle.C’- XMSS.C, 
where XMSS.C will prove the knowledge of a valid XMSS signature (i.e. 1 = 
XMSS.Verify(ID, SID, mpk)). XRS security comes from the EU-CMA security of 
XMSS proven in [24], from the properties of the Merkle accumulator and from 
the one-wayness of circuits Merkle.C and XMSS.C discussed in Sect. 4.3. 


5 Evaluation 


This section analyzes the signature sizes of both applicable constructions PicRS 
and XRS for a post-quantum security level of A = 128 bits. We evaluate both 
constructions using two different non-interactive zero-knowledge proof systems 
(NIZK): KKW [26] and Ligero++ [9]. We chose to work with these NIZKs 
because, on the one hand, KKW is considered the current state-of-the-art of 
symmetric-key based NIZK. It has been analyzed in the QROM model, it is 
considered by NIST as an alternate candidate for the standardization process 
[1] and provides all the security properties that our constructions require. On 
the other hand, Ligero++ has been less studied and its post-quantum security 
is only assumed, but according to the literature it achieves the most compet- 
itive proof size for large circuits when compared to other works as Aurora [7] 
or ZK-STARK [6], which could be promising NIZK’s alternative. KKW is opti- 
mized to work on binary circuits while Ligero++ is optimized for arithmetic 
circuits. We implemented our schemes using different hash functions which are 
either considered as binary or as arithmetic circuits. We use the standard hash 
function SHA3 but we also tested our constructions with other hash functions 
which have an optimized complexity that decreases the overall size of the cir- 
cuit and of the signature. The complexity of all circuits are expressed in terms 
of “number of H executions” and we consider an execution of the compression 
function H(x,y) = z with x,y,z € {0,1}*. If H has n inputs, the number of 
counted execution is n — 1. 

In the rest of this section, we discuss the complexity of circuit Merkle.C. 
We then start the discussion on optimizing circuit Picnic.C and XMSS.C in 
order to have the shortest signature possible for PicRS and XRS. We express 
the complexity in terms of number of hash executions. We conclude the paper 
with a comparison between both schemes and a comparison with the current 


906 M. Buser et al. 


state-of-the-art of post-quantum IDRS constructed with lattices and some final 
recommendations. 


Merkle.C Complexity. The complexity of Merkle.C, which is used to “accu- 
mulate” all identities in the ring L increases logarithmically (O(log N)) with 
the size of the ring. This ensues from the Merkle tree structure of the accu- 
mulator (see Sect. 4.1). The complexity of circuit Merkle.C can be expressed as 
|Merkle.C| = log N - (|H|+ ||), where |H| is the complexity of the hash func- 
tion H, |u] is the complexity of the multiplexer (see Eq. 1), and N the ring size. 
Merkle.C’ is implemented in both constructions, therefore this discussion is valid 
for both PicRS and XRS. 


5.1 PicRS Signature’s Size 


The signature size of PicRS depends on the complexity of the circuit PicRS.C = 
Merkle.C’ - Picnic.C (see Sect. 4.3). To analyze the PicRS signature size, we need 
to investigate circuit Picnic.C described in Sect. 4.2 and in Algorithm 1. We opti- 
mize the complexity circuit Picnic.C by testing different parameters, proposed 
in their last paper [25], for the Picnic digital signature used by PKG. We express 
the complexity of Picnic.C in function of executions of hash functions H and G, 
of number of multiplication and addition gates. We consider that an execution 
of H has two inputs and therefore the number of calls of H grows linearly with 
the number of inputs. 


Table 3. Picnic.C’s complexity for different Picnic parameters n, M, and 7. It shows 
the number of executions for the hash function H and G, the multiplication (Mult.) 
and addition (Add.) gates 


Picnic Circuits complexity 
n IM |r Picnic.cı Picnic.c2 Picnic.c3 Picnic.c4 Picnic.C’ 

H H H Mult. | Add. G H G|Mult. |Add. 
3 |438 | 438 | 1752 0 5256 | 2680560 | 244831488 | 1 7008 i | 2680560 244831488 
16 604/68 | 2040 1673392 | 4352 | 3121200 | 285077760 | 1 1679784 | 1 / 3121200 285077760 
64 803) 50 || 6300 2495442 | 12800 | 9639000 | 880387200 | 1 2514542 | 1 | 9639000 880387200 


Table 3 demonstrates that Picnic.C achieves its optimal circuit size for the 
Picnic scheme with the parameters n = 3, T = 438, and M = 438. This comes 
principally from the fact that such an instance does not need to execute sub- 
circuit Picnic.cg as M = r. Therefore, for the rest of the analysis of PicRS, we 
use Picnic parameters (n = 3, T = 438 = M) as the digital signature for PKG. 
Table 1 shows the signature sizes of PicRS for different ring sizes with different 
hash functions and NIZKs. 


5.2 XRS Signature’s Size 


XRS signature size depends on the complexity of circuit XRS.C composed of sub- 
circuits XMSS.C and Merkle.C (see Sect. 4.3) on which the NIZK.Prove algorithm 
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is executed. In this part, we focus on the specific sub-circuit XMSS.C and its 
internal sub-circuit XMSS.c;, which is composed of len-Wint+len—1 executions of 
the hash function H and the second sub-circuit XMSS.c2 consisting of h calls to 
H and the multiplexer u. Table 4 presents the complexity of circuit XMSS.C with 
different parameters for the XMSS scheme used by PKG to generate all secret 
signing keys. We used parameters proposed by XMSS’s original paper [24]. Our 
results demonstrates that XMSS should be implemented with the parameters 
Wint = 4, len = 133, and h = 20 to optimize the complexity of XMSS. It is 
important to note that 2” is the maximum number of SID that can be generated 
by PKG in this case. The detailed reason of setting h = 20 is presented in 
Sect. 5.3. 


Table 4. XMSS.C circuit sizes for each sub-circuit for different Wint and len. 


XMSS Circuits complexity 
Wint | len | h | |XMSS.ci| | |KMSS.c2| |XMSS.C| 
4 133 | 20 | 665 - |H] 685 -|H| + 20- || 
16 |67 |20|1139-|A| | 20- (H| +p) | 1159- | A] + 20- |u] 
64 | 44 | 20| 2880- |H| 2900 - |H| + 20 - |u| 


After optimizing the complexity of the XMSS.C, we evaluate the signature 
size of XRS for different group sizes and with different NIZKs and hash functions. 
The details of XRS features and results are presented in Table 1. 


5.3 PicRS vs XRS 


Choice of NIZK: As presented in Table 1, our implementations using Ligero++ 
as a NIZK are the best options when targeting a signature size optimization. It 
works better for large circuits as its proof size grows logarithmically with the 
circuit size while KK W grows linearly with the number of multiplication gates 
in the circuit. For this reason, our Ligero++-based implementations can use 
the standard hash function SHA3 and still achieve a decent signature size while 
our KKW-based implementation requires a specifically designed hash function 
(LowMC) to be competitive. To the best of our knowledge, KKW is optimized 
for binary circuits similar to LowMC while Poseidon and MiMC are arithmetic 
circuits that work over larger finite fields, which makes them unpractical for 
KKW. However, KKW has been submitted to the NIST standardization process 
[1], thus giving stronger security guarantees than the other Ligero++. Ligero++ 
was only published recently and its post-quantum security has been assumed as 
it relies on known post-quantum paradigms, but it has not been proven. 


Signature Size: Table 1 summarizes the performance of both schemes in terms 
of signature size. XRS clearly outperforms PicRS due to the lower complexity of 
circuit XMSS.C (see Table4) used in XRS compared to Picnic.C’ (see Table 3) 
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implemented in PicRS. Even if in theory both signature sizes should increase log- 
arithmically with the ring size, we observe that both schemes provide a nearly 
constant signature size because the circuit complexity of PicRS.C and XRS.C 
depends mainly on Picnic.C and XMSS.C. Picnic.C represent 99% of PicRS.C 
complexity and XMSS.C 95% of XRS.C’s complexity for the largest ring N = 27°. 
Because of Ligero++ proof size that increases only logarithmically with the cir- 
cuit size while KKW’s one increases linearly with the number of multiplication 
gates, PicRS and XRS implemented with Ligero++ have a signature size “more 
constant” than the ones implemented with KKW (see Table 1). A possible opti- 
mization for XRS could be replacing the WOTS* scheme by a few-time signature 
scheme named FORS [8] in the XMSS scheme used by PKG, which should further 
reduce the signature size for XRS. However, this assumption requires a formal 
security analysis. 


PKG Characteristic: In PicRS, PKG enjoys the stateless feature of Picnic and 
therefore does not need to update his secret key after a signature as it is required 
for XRS. The main advantage of PicRS over XRS is that PKG can theoretically 
generate an infinite number of signing secret keys, so can handle an infinite 
number of users, while XRS is limited to 2” users. Our implementation showed 
in Table 1 sets h = 20 due to the computation complexity of generating a XMSS 
tree (e.g. IDRS.Setup algorithm) which is grows exponentially with h. 


Comparison with Lattice-Based IDRS: Table1 also highlights the com- 
petitiveness of XRS and PicRS when it comes to signature sizes compared with 
lattice-based IDRS. It is important to highlight that none of the lattice-based 
works gave a precise signature size. We estimated the signature size of Zhao et al. 
[33] work according to their formula. We fixed their parameters to n = 1000 (n is 
their security parameters for the short integer solution problem (SIS)), q = 27°, 
w = 3 and k = 41. Their estimated size is presented in Table 1. Regardless of the 
difference of the actual signature size, XRS and PicRS enjoy a nearly constant 
signature while all current state-of-the-art of lattice constructions [12,30,33], [31] 
have a signature size increasing linearly with the ring size N. Therefore, all of 
our implementations shown in Table 1 are more suitable for large rings than the 
lattice-based IDRS. Investigating the traditional state-of-the-art lattice-based 
ring signature designed by Esgin et al. [18] could be a promising future work to 
improve the competitiveness of lattice-based IDRS. 


Final Recommendations and Conclusion: Table 1 shows that XRS imple- 
mented with Ligero++ is our most promising construction when an optimized 
signature size is desired. It achieves a competitive signature size with hash func- 
tions Poseidon, MiMC and even with the standard hash SHA3. ZK-STARK [6] 
and Aurora [7] could be a alternative to Ligero++, they both achieve a proof 
size slightly larger than Ligero++, but are still competitive. As illustrated in 
Table1, XRS outperforms PicRS with a smaller signature size and a smaller 
SID that comes from the difference in size between XMSS and Picnic. It is also 
important to highlight that our possible constructions have been evaluated the- 
oretically and it would be interesting to investigate the applicability with an 
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implementation. According to KKW [26] Ligero+-+ [9] original papers the cir- 
cuit’s complexity influences running and the memory complexity of the signing 
and verification algorithms. This increases the advantage of XRS over PicRS. 
Therefore, our final recommendation would be to use XRS implemented either 
with Poseidon and Ligero++ to achieve the best compromise between proof 
size and security or with KKW combined with LowMC to ensure post-quantum 
security. 


A Definitions 


This section defines the algorithms of the used primitives, their related security 
definitions are presented in the full version [11]. 


Definition 2 (Digital signature). A digital signature scheme DS is composed 
by the following algorithms: 


(DS.pk, DS.sk) — DS.KeyGen(1*): This takes as input the security parameter \ 
and outputs the keypair (DS.pk, DS.sk). 

DS.c — DS.Sign(m, DS.sk): This takes as inputs a message m to be signed and 
a secret key DS.sk. It outputs a valid digital signature DS.c. 

0/1 — DS.Verify(m, DS.c, DS.pk): This takes as inputs the signed message m, a 
digital signature DS.o, and the public key DS.pk. It outputs 1 if DS.o is valid 
and 0, otherwise. 


Definition 3 (Non-interactive zero-knowledge proof system (NIZK)). 
Non-interactive zero-knowledge proof system (NIZK) [5] aims to prove that a 
public statement x and a private witness w belong to a defined relation R (i.e. 
(x,w) € R). We also let Lp = {x|3w s.t. (x,w) € R}. A NIZK consists of the 
following three algorithms: 


crs — NIZK.Setup(1*) : This generates the common reference string crs from the 
security parameters À. 

m + NIZK.Prove(crs, x, w): This generates a proof x for the common reference 
string crs, the statement x and the witness w that satisfies the relation R (to 
be more specific, we have (x,w) € R). 

0/1 — NIZK.Verify(crs, x, m): This returns 1 if the proof x based on the common 
reference string crs and the public statement x is valid, 0 otherwise. 


Remark 1. In this paper, we omit the use of the common reference string crs. 


Definition 4 (Accumulator). An accumulator [10] A is defined by the fol- 
lowing algorithms: 


A.pk — A.Gen(1*): The setup algorithm takes as input the security parameter À 
and outputs the public key A.pk. 

(Ax, A.pk) — A.Eval(A.pk, ¥): The evaluation algorithm takes as inputs the pub- 
lic key A.pk and the set X and outputs the accumulator Ax and an updated 
public key A.pk for the new accumulated set X. 
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Wz, / L— A.WitGen(A.pk, Ax, ¥, zi): The witness generation algorithm takes as 
inputs the public key A.pk, the accumulator Ax, the set X, and an element 
xi. It outputs the witness wz, if x; E€ X and L otherwise. 

0/1 — A.Verify(A.pk, Ax, Wz;, zi): The verification algorithm takes as inputs the 
public key A.pk, the accumulator Ax, the witness Wz,, and the element xi. It 
outputs 1 if We, is a valid witness for x; E€ X and 0 otherwise. 


Definition 5 (Cryptographic Hash function). A cryptographic hash func- 
tion 


H : 40,1)" = {0,1} (2) 


takes as input a message a of any length and outputs the hash value b of length 
2X bits. A cryptographic hash function fulfills the three following properties: 


— Pre-image resistance (one-wayness): given a hash value b, where b = H(a) 
for a uniformly random a € {0,1}* it is computationally infeasible (in 
polynomial-time) to find a such that b = H(a). 

- Second Pre-image Resistance: knowing a pair (ao, H(ao)) for a uniformly 
random ao € {0,1}* it is computationally infeasible to find another input 
aı E {0,1}* such that H(a,) = H (ao). 

- Collision Resistance: it is computationally infeasible to find two different 
inputs ag and a, such that ag Æ a, resulting with the same hash value 
b = H(ao) = H(a1) 
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